Thread: Implementing Incremental View Maintenance
Hi, I would like to implement Incremental View Maintenance (IVM) on PostgreSQL. IVM is a technique to maintain materialized views which computes and applies only the incremental changes to the materialized views rather than recomputate the contents as the current REFRESH command does. I had a presentation on our PoC implementation of IVM at PGConf.eu 2018 [1]. Our implementation uses row OIDs to compute deltas for materialized views. The basic idea is that if we have information about which rows in base tables are contributing to generate a certain row in a matview then we can identify the affected rows when a base table is updated. This is based on an idea of Dr. Masunaga [2] who is a member of our group and inspired from ID-based approach[3]. In our implementation, the mapping of the row OIDs of the materialized view and the base tables are stored in "OID map". When a base relation is modified, AFTER trigger is executed and the delta is recorded in delta tables using the transition table feature. The accual udpate of the matview is triggerd by REFRESH command with INCREMENTALLY option. However, we realize problems of our implementation. First, WITH OIDS will be removed since PG12, so OIDs are no longer available. Besides this, it would be hard to implement this since it needs many changes of executor nodes to collect base tables's OIDs during execuing a query. Also, the cost of maintaining OID map would be high. For these reasons, we started to think to implement IVM without relying on OIDs and made a bit more surveys. We also looked at Kevin Grittner's discussion [4] on incremental matview maintenance. In this discussion, Kevin proposed to use counting algorithm [5] to handle projection views (using DISTNICT) properly. This algorithm need an additional system column, count_t, in materialized views and delta tables of base tables. However, the discussion about IVM is now stoped, so we would like to restart and progress this. Through our PoC inplementation and surveys, I think we need to think at least the followings for implementing IVM. 1. How to extract changes on base tables I think there would be at least two approaches for it. - Using transition table in AFTER triggers - Extracting changes from WAL using logical decoding In our PoC implementation, we used AFTER trigger and transition tables, but using logical decoding might be better from the point of performance of base table modification. If we can represent a change of UPDATE on a base table as query-like rather than OLD and NEW, it may be possible to update the materialized view directly instead of performing delete & insert. 2. How to compute the delta to be applied to materialized views Essentially, IVM is based on relational algebra. Theorically, changes on base tables are represented as deltas on this, like "R <- R + dR", and the delta on the materialized view is computed using base table deltas based on "change propagation equations". For implementation, we have to derive the equation from the view definition query (Query tree, or Plan tree?) and describe this as SQL query to compulte delta to be applied to the materialized view. There could be several operations for view definition: selection, projection, join, aggregation, union, difference, intersection, etc. If we can prepare a module for each operation, it makes IVM extensable, so we can start a simple view definition, and then support more complex views. 3. How to identify rows to be modifed in materialized views When applying the delta to the materialized view, we have to identify which row in the matview is corresponding to a row in the delta. A naive method is matching by using all columns in a tuple, but clearly this is unefficient. If thematerialized view has unique index, we can use this. Maybe, we have to force materialized views to have all primary key colums in their base tables. In our PoC implementation, we used OID to identify rows, but this will be no longer available as said above. 4. When to maintain materialized views There are two candidates of the timing of maintenance, immediate (eager) or deferred. In eager maintenance, the materialized view is updated in the same transaction where the base table is updated. In deferred maintenance, this is done after the transaction is commited, for example, when view is accessed, as a response to user request, etc. In the previous discussion[4], it is planned to start from "eager" approach. In our PoC implementaion, we used the other aproach, that is, using REFRESH command to perform IVM. I am not sure which is better as a start point, but I begin to think that the eager approach may be more simple since we don't have to maintain base table changes in other past transactions. In the eager maintenance approache, we have to consider a race condition where two different transactions change base tables simultaneously as discussed in [4]. [1] https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/ [2] https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1 (Japaneseonly) [3] https://dl.acm.org/citation.cfm?id=2750546 [4] https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com [5] https://dl.acm.org/citation.cfm?id=170066 Regards, -- Yugo Nagata <nagata@sraoss.co.jp>
Hi Yugo. > I would like to implement Incremental View Maintenance (IVM) on > PostgreSQL. Great. :-) I think it would address an important gap in PostgreSQL’s feature set. > 2. How to compute the delta to be applied to materialized views > > Essentially, IVM is based on relational algebra. Theorically, changes on > base > tables are represented as deltas on this, like "R <- R + dR", and the > delta on > the materialized view is computed using base table deltas based on "change > propagation equations". For implementation, we have to derive the > equation from > the view definition query (Query tree, or Plan tree?) and describe this as > SQL > query to compulte delta to be applied to the materialized view. We had a similar discussion in this thread https://www.postgresql.org/message-id/flat/FC784A9F-F599-4DCC-A45D-DBF6FA582D30%40QQdd.eu, and I’m very much in agreement that the "change propagation equations” approach can solve for a very substantial subset of common MV use cases. > There could be several operations for view definition: selection, > projection, > join, aggregation, union, difference, intersection, etc. If we can > prepare a > module for each operation, it makes IVM extensable, so we can start a > simple > view definition, and then support more complex views. Such a decomposition also allows ’stacking’, allowing complex MV definitions to be attacked even with only a small handful of modules. I did a bit of an experiment to see if "change propagation equations” could be computed directly from the MV’s pg_node_tree representation in the catalog in PlPgSQL. I found that pg_node_trees are not particularly friendly to manipulation in PlPgSQL. Even with a more friendly-to-PlPgSQL representation (I played with JSONB), then the next problem is making sense of the structures, and unfortunately amongst the many plan/path/tree utility functions in the code base, I figured only a very few could be sensibly exposed to PlPgSQL. Ultimately, although I’m still attracted to the idea, and I think it could be made to work, native code is the way to go at least for now. > 4. When to maintain materialized views > > [...] > > In the previous discussion[4], it is planned to start from "eager" > approach. In our PoC > implementaion, we used the other aproach, that is, using REFRESH command > to perform IVM. > I am not sure which is better as a start point, but I begin to think that > the eager > approach may be more simple since we don't have to maintain base table > changes in other > past transactions. Certainly the eager approach allows progress to be made with less infrastructure. I am concerned that the eager approach only addresses a subset of the MV use case space, though. For example, if we presume that an MV is present because the underlying direct query would be non-performant, then we have to at least question whether applying the delta-update would also be detrimental to some use cases. In the eager maintenance approache, we have to consider a race condition where two different transactions change base tables simultaneously as discussed in [4]. I wonder if that nudges towards a logged approach. If the race is due to fact of JOIN-worthy tuples been made visible after a COMMIT, but not before, then does it not follow that the eager approach has to fire some kind of reconciliation work at COMMIT time? That seems to imply a persistent queue of some kind, since we can’t assume transactions to be so small to be able to hold the queue in memory. Hmm. I hadn’t really thought about that particular corner case. I guess a ‘catch' could be simply be to detect such a concurrent update and demote the refresh approach by marking the MV stale awaiting a full refresh. denty. -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
I am concerned that the eager approach only addresses a subset of the MV use
case space, though. For example, if we presume that an MV is present because
the underlying direct query would be non-performant, then we have to at
least question whether applying the delta-update would also be detrimental
to some use cases.
Hi all, just wanted to say I am very happy to see progress made on this, my codebase has multiple "materialized tables" which are maintained with statement triggers (transition tables) and custom functions. They are ugly and a pain to maintain, but they work because I have no other solution...for now at least.I am concerned that the eager approach only addresses a subset of the MV use
case space, though. For example, if we presume that an MV is present because
the underlying direct query would be non-performant, then we have to at
least question whether applying the delta-update would also be detrimental
to some use cases.I will say that in my case, as long as my reads of the materialized view are always consistent with the underlying data, that's what's important. I don't mind if it's eager, or lazy (as long as lazy still means it will refresh prior to reading).
> Hi all, just wanted to say I am very happy to see progress made on this, > my codebase has multiple "materialized tables" which are maintained with > statement triggers (transition tables) and custom functions. They are ugly > and a pain to maintain, but they work because I have no other > solution...for now at least. > > I am concerned that the eager approach only addresses a subset of the MV use >> case space, though. For example, if we presume that an MV is present >> because >> the underlying direct query would be non-performant, then we have to at >> least question whether applying the delta-update would also be detrimental >> to some use cases. >> > > I will say that in my case, as long as my reads of the materialized view > are always consistent with the underlying data, that's what's important. I > don't mind if it's eager, or lazy (as long as lazy still means it will > refresh prior to reading). Assuming that we want to implement IVM incrementally (that means, for example, we implement DELETE for IVM in PostgreSQL XX, then INSERT for IVM for PostgreSQL XX+1... etc.), I think it's hard to do it with an eager approach if want to MV is always consistent with base tables. On the other hand, a lazy approach allows to implement IVM incrementally because we could always let full MV build from scratch if operations on MV include queries we do not support. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
> Hi all, just wanted to say I am very happy to see progress made on this,
> my codebase has multiple "materialized tables" which are maintained with
> statement triggers (transition tables) and custom functions. They are ugly
> and a pain to maintain, but they work because I have no other
> solution...for now at least.
>
> I am concerned that the eager approach only addresses a subset of the MV use
>> case space, though. For example, if we presume that an MV is present
>> because
>> the underlying direct query would be non-performant, then we have to at
>> least question whether applying the delta-update would also be detrimental
>> to some use cases.
>>
>
> I will say that in my case, as long as my reads of the materialized view
> are always consistent with the underlying data, that's what's important. I
> don't mind if it's eager, or lazy (as long as lazy still means it will
> refresh prior to reading).
Assuming that we want to implement IVM incrementally (that means, for
example, we implement DELETE for IVM in PostgreSQL XX, then INSERT for
IVM for PostgreSQL XX+1... etc.), I think it's hard to do it with an
eager approach if want to MV is always consistent with base tables.
On the other hand, a lazy approach allows to implement IVM
incrementally because we could always let full MV build from scratch
if operations on MV include queries we do not support.
Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
Hi! On Thu, Dec 27, 2018 at 4:57 AM Yugo Nagata <nagata@sraoss.co.jp> wrote: > I would like to implement Incremental View Maintenance (IVM) on PostgreSQL. > IVM is a technique to maintain materialized views which computes and applies > only the incremental changes to the materialized views rather than > recomputate the contents as the current REFRESH command does. That sounds great! I am interested in this topic because I am interested in reactive/live queries and support for them in PostgreSQL. [1] In that context, the problem is very similar: based on some state of query results and updated source tables, determine what should be new updates to send to the client describing changes to the query results. So after computing those incremental changes, instead of applying them to materialized view I would send them to the client. One could see materialized views only type of consumers of such information about incremental change. So I would like to ask if whatever is done in this setting is done in a way that one could also outside of the context of materialized view. Not sure what would API be thought. From the perspective of reactive/live queries, this package [2] is interesting. To my understanding, it adds to all base tables two columns, one for unique ID and one for revision of the row. And then rewrites queries so that this information is passed all the way to query results. In this way it can then determine mapping between inputs and outputs. I am not sure if it then does incremental update or just uses that to determine if view is invalidated. Not sure if there is anything about such approach in literature. Or why both index and revision columns are needed. > For these reasons, we started to think to implement IVM without relying on OIDs > and made a bit more surveys. I also do not see much difference between asking users to have primary key on base tables or asking them to have OIDs. Why do you think that a requirement for primary keys is a hard one? I think we should first focus on having IVM with base tables with primary keys. Maybe then later on we could improve on that and make it also work without. To me personally, having unique index on source tables and also on materialized view is a reasonable restriction for this feature. Especially for initial versions of it. > However, the discussion about IVM is now stoped, so we would like to restart and > progress this. What would be next steps in your view to move this further? > If we can represent a change of UPDATE on a base table as query-like rather than > OLD and NEW, it may be possible to update the materialized view directly instead > of performing delete & insert. Why do you need OLD and NEW? Don't you need just NEW and a list of columns which changed from those in NEW? I use such diffing query [4] to represent changes: first column has a flag telling if the row is representing insert, update, and remove, the second column tells which column are being changed in the case of the update, and then the NEW columns follow. I think that maybe standardizing structure for representing those changes would be a good step towards making this modular and reusable. Because then we can have three parts: * Recording and storing changes in a standard format. * A function which given original data, stored changes, computes updates needed, also in some standard format. * A function which given original data and updates needed, applies them. > In the previous discussion[4], it is planned to start from "eager" approach. In our PoC > implementaion, we used the other aproach, that is, using REFRESH command to perform IVM. > I am not sure which is better as a start point, but I begin to think that the eager > approach may be more simple since we don't have to maintain base table changes in other > past transactions. I think if we split things into three parts as I described above, then this is just a question of configuration. Or you call all three inside one trigger to update in "eager" fashion. Or you store computed updates somewhere and then on demand apply those in "lazy" fashion. > In the eager maintenance approache, we have to consider a race condition where two > different transactions change base tables simultaneously as discussed in [4]. But in the case of "lazy" maintenance there is a mirror problem: what if later changes to base tables invalidate some previous change to the materialized view. Imagine that one cell in a base table is first updated too "foo" and we compute an update for the materialized view to set it to "foo". And then the same cell is updated to "bar" and we compute an update for the materialized view again. If we have not applied any of those updates (because we are "lazy") now the previously computed update can be discarded. We could still apply both, but it would not be efficient. [1] https://www.postgresql.org/message-id/flat/CAKLmikP%2BPPB49z8rEEvRjFOD0D2DV72KdqYN7s9fjh9sM_32ZA%40mail.gmail.com [2] https://github.com/nothingisdead/pg-live-query [3] https://www.postgresql.org/docs/devel/sql-createtable.html [4] https://github.com/tozd/node-reactive-postgres/blob/eeda4f28d096b6e552d04c5ea138c258cb5b9389/index.js#L329-L340 Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
On Tue, 1 Jan 2019 14:46:25 +0700 Nguyễn Trần Quốc Vinh <ntquocvinh@gmail.com> wrote: > We have some result on incremental update for MVs. We generate triggers on > C to do the incremental maintenance. We posted the code to github about 1 > year ago, but unfortunately i posted a not-right ctrigger.h header. The > mistake was exposed to me when a person could not compile the generated > triggers and reported to me. And now i re-posted with the right ctrigger.h > file. > > You can find the codes of the generator here: > https://github.com/ntqvinh/PgMvIncrementalUpdate/commits/master. You can > find how did we do here: > https://link.springer.com/article/10.1134/S0361768816050066. The paper is > about generating of codes in pl/pgsql. Anyway i see it is useful for > reading the codes. I don't know if i can share the paper or not so that i > don't publish anywhere else. The text about how to generate triggers in C > was published with open-access but unfortunately, it is in Vietnamese. > > We are happy if the codes are useful for someone. I have read your paper. It is interesting and great so that the algorithm is described concretely. After reading this, I have a few questions about your implementation. Although I may be able to understand by reading your paper and code carefully, I would appreciate it if you could answer these. - It is said there are many limitations on the view definition query. How does this work when the query is not supported? - Is it possible to support materialized views that have DISTINCT, OUTER JOIN, or sub-query in your approach? - It is said that AVG is splitted to SUM and COUNT. Are these new additional columns in MV visible for users? - Does this can handle the race condition discussed in [1], that is, if concurrent transactions update different two tables in the join view definition, is MV updated sucessfully? [1] https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com Regards, -- Yugo Nagata <nagata@sraoss.co.jp>
On Mon, 7 Jan 2019 00:39:00 -0800 Mitar <mmitar@gmail.com> wrote: > That sounds great! I am interested in this topic because I am > interested in reactive/live queries and support for them in > PostgreSQL. [1] > > In that context, the problem is very similar: based on some state of > query results and updated source tables, determine what should be new > updates to send to the client describing changes to the query results. > So after computing those incremental changes, instead of applying them > to materialized view I would send them to the client. One could see > materialized views only type of consumers of such information about > incremental change. > > So I would like to ask if whatever is done in this setting is done in > a way that one could also outside of the context of materialized view. > Not sure what would API be thought. I didn't know about reactive/live queries but this seems share a part of problem with IVM, so we might have common API. BTW, what is uecase of reactive/live queries? (just curious) > > For these reasons, we started to think to implement IVM without relying on OIDs > > and made a bit more surveys. > > I also do not see much difference between asking users to have primary > key on base tables or asking them to have OIDs. Why do you think that > a requirement for primary keys is a hard one? I think we should first > focus on having IVM with base tables with primary keys. Maybe then > later on we could improve on that and make it also work without. > > To me personally, having unique index on source tables and also on > materialized view is a reasonable restriction for this feature. > Especially for initial versions of it. Initially, I chose to use OIDs for theoretical reason, that is, to handle "bag-semantics" which allows duplicate rows in tables. However, I agree that we start from the restriction of having unique index on base tables. > > If we can represent a change of UPDATE on a base table as query-like rather than > > OLD and NEW, it may be possible to update the materialized view directly instead > > of performing delete & insert. > > Why do you need OLD and NEW? Don't you need just NEW and a list of > columns which changed from those in NEW? I use such diffing query [4] > to represent changes: first column has a flag telling if the row is > representing insert, update, and remove, the second column tells which > column are being changed in the case of the update, and then the NEW > columns follow. According the change propagation equation approach, OLD is necessary to calculate tuples in MV to be deleted or modified. However, if tables has unique keys, such tuples can be identifeid using the keys, so OLD may not be needed, at least in eager approach. In lazy approach, OLD contents of table is useful. For example, with a join view MV = R * S, when dR is inserted into R and dS is inserted into S, the delta to be inserted into MV will be dMV = (R_old * dS) + (dR * S_new) = (R_old * dS) + (dR * S_old) + (dR * dS) , hence the old contents of tables R and S are needed. > I think that maybe standardizing structure for representing those > changes would be a good step towards making this modular and reusable. > Because then we can have three parts: > > * Recording and storing changes in a standard format. > * A function which given original data, stored changes, computes > updates needed, also in some standard format. > * A function which given original data and updates needed, applies them. > I think if we split things into three parts as I described above, then > this is just a question of configuration. Or you call all three inside > one trigger to update in "eager" fashion. Or you store computed > updates somewhere and then on demand apply those in "lazy" fashion. I agree that defining the format to represent changes is important. However, I am not sure both of eager and lazy can be handled in the same manner. I'll consider about this more. > > In the eager maintenance approache, we have to consider a race condition where two > > different transactions change base tables simultaneously as discussed in [4]. > > But in the case of "lazy" maintenance there is a mirror problem: what > if later changes to base tables invalidate some previous change to the > materialized view. Imagine that one cell in a base table is first > updated too "foo" and we compute an update for the materialized view > to set it to "foo". And then the same cell is updated to "bar" and we > compute an update for the materialized view again. If we have not > applied any of those updates (because we are "lazy") now the > previously computed update can be discarded. We could still apply > both, but it would not be efficient. In our PoC implementation, I handled this situation by removing old contents from NEW delata table. In your example, when the base table is updated from "foo" to "bar", the "foo" tuple is removed from and the "bar" tuple is inserted in NEW delta and the delta of MV is computed using the final NEW delta. Regards, -- Yugo Nagata <nagata@sraoss.co.jp>
Hi! On Thu, Jan 31, 2019 at 6:20 AM Yugo Nagata <nagata@sraoss.co.jp> wrote: > BTW, what is uecase of reactive/live queries? (just curious) It allows syncing the state between client and server. Client can then have a subset of data and server can push changes as they are happening to the client. Client can in a reactive manner render that in the UI to the user. So you can easily create a reactive UI which always shows up-to-date data without having to poll or something similar. How are things progressing? Any news on this topic? Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
On Thu, 27 Dec 2018 21:57:26 +0900 Yugo Nagata <nagata@sraoss.co.jp> wrote: > Hi, > > I would like to implement Incremental View Maintenance (IVM) on PostgreSQL. I am now working on an initial patch for implementing IVM on PostgreSQL. This enables materialized views to be updated incrementally after one of their base tables is modified. At the first patch, I want to start from very simple features. Firstly, this will handle simple definition views which includes only selection, projection, and join. Standard aggregations (count, sum, avg, min, max) are not planned to be implemented in the first patch, but these are commonly used in materialized views, so I'll implement them later on. Views which include sub-query, outer-join, CTE, and window functions are also out of scope of the first patch. Also, views including self-join or views including other views in their definition is not considered well, either. I need more investigation on these type of views although I found some papers explaining how to handle sub-quries and outer-joins. Next, this will handle materialized views with no duplicates in their tuples. I am thinking of implementing an algorithm to handle duplicates called "counting-algorithm" afterward, but I'll start from this no-duplicates assumption in the first patch for simplicity. In the first patch, I will implement only "immediate maintenance", that is, materialized views are updated immediately ina transaction where a base table is modified. On other hand, in "deferred maintenance", materialized views are updated after the transaction, for example, by the user command like REFRESH. Although I plan to implement both eventually, I'll start from "immediate" because this seems to need smallercode than "deferred". For implementing "deferred", it is need to implement a mechanism to maintain logs for recording changes and an algorithm to compute the delta to be applied to materialized views are necessary. I plan to implement the immediate maintenance using AFTER triggers created automatically on a materialized view's base tables. In AFTER trigger using transition table features, changes occurs on base tables is recorded ephemeral relations. We can compute the delta to beapplied to materialized views by using these ephemeral relations and the view definition query, then update the view by applying this delta. -- Yugo Nagata <nagata@sraoss.co.jp>
On Sun, 31 Mar 2019 at 23:22, Yugo Nagata <nagata@sraoss.co.jp> wrote: > > Firstly, this will handle simple definition views which includes only > selection, projection, and join. Standard aggregations (count, sum, avg, > min, max) are not planned to be implemented in the first patch, but these > are commonly used in materialized views, so I'll implement them later on. It's fine to not have all the features from day 1 of course. But I just picked up this comment and the followup talking about splitting AVG into SUM and COUNT and I had a comment. When you do look at tackling aggregates I don't think you should restrict yourself to these specific standard aggregations. We have all the necessary abstractions to handle all aggregations that are feasible, see https://www.postgresql.org/docs/devel/xaggr.html#XAGGR-MOVING-AGGREGATES What you need to do -- I think -- is store the "moving aggregate state" before the final function. Then whenever a row is inserted or deleted or updated (or whenever another column is updated which causes the value to row to enter or leave the aggregation) apply either aggtransfn or aggminvtransfn to the state. I'm not sure if you want to apply the final function on every update or only lazily either may be better in some usage.
On Mon, 1 Apr 2019 12:11:22 +0900 Yugo Nagata <nagata@sraoss.co.jp> wrote: > On Thu, 27 Dec 2018 21:57:26 +0900 > Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > Hi, > > > > I would like to implement Incremental View Maintenance (IVM) on PostgreSQL. > > I am now working on an initial patch for implementing IVM on PostgreSQL. > This enables materialized views to be updated incrementally after one > of their base tables is modified. Attached is a WIP patch of Incremental View Maintenance (IVM). Major part is written by me, and changes in syntax and pg_class are Hoshiai-san's work. Although this is sill a draft patch in work-in-progress, any suggestions or thoughts would be appreciated. * What it is This allows a kind of Immediate Maintenance of materialized views. if a materialized view is created by CRATE INCREMENTAL MATERIALIZED VIEW command, the contents of the mateview is updated automatically and incrementally after base tables are updated. Noted this syntax is just tentative, so it may be changed. ====== Example 1 ====== postgres=# CREATE INCREMENTAL MATERIALIZED VIEW m AS SELECT * FROM t0; SELECT 3 postgres=# SELECT * FROM m; i --- 3 2 1 (3 rows) postgres=# INSERT INTO t0 VALUES (4); INSERT 0 1 postgres=# SELECt * FROM m; -- automatically updated i --- 3 2 1 4 (4 rows) ============================= This implementation also supports matviews including duplicate tuples or DISTINCT clause in its view definition query. For example, even if a matview is defined with DISTINCT to remove duplication of tuples in a base table, this can perform incremental update of the matview properly. That is, the contents of the matview doesn't change when exiting tuples are inserted into the base tables, and a tuple in the matview is deleted only when duplicity of the corresponding tuple in the base table becomes zero. This is due to "colunting alogorithm" in which the number of each tuple is stored in matviews as a special column value. ====== Example 2 ====== postgres=# SELECT * FROM t1; id | t ----+--- 1 | A 2 | B 3 | C 4 | A (4 rows) postgres=# CREATE INCREMENTAL MATERIALIZED VIEW m1 AS SELECT t FROM t1; SELECT 3 postgres=# CREATE INCREMENTAL MATERIALIZED VIEW m2 AS SELECT DISTINCT t FROM t1; SELECT 3 postgres=# SELECT * FROM m1; -- with duplicity t --- A A C B (4 rows) postgres=# SELECT * FROM m2; t --- A B C (3 rows) postgres=# INSERT INTO t1 VALUES (5, 'B'); INSERT 0 1 postgres=# DELETE FROM t1 WHERE id IN (1,3); -- delete (1,A),(3,C) DELETE 2 postgres=# SELECT * FROM m1; -- one A left and one more B t --- B B A (3 rows) postgres=# SELECT * FROM m2; -- only C is removed t --- B A (2 rows) ============================= * How it works 1. Creating matview When a matview is created, AFTER triggers are internally created on its base tables. When the base tables is modified (INSERT, DELETE, UPDATE), the matview is updated incrementally in the trigger function. When populating the matview, GROUP BY and count(*) are added to the view definition query before this is executed for counting duplicity of tuples in the matview. The result of count is stored in the matview as a special column named "__ivm_count__". 2. Maintenance of matview When base tables are modified, the change set of the table can be referred as Ephemeral Named Relations (ENRs) thanks to Transition Table (a feature of trigger implemented since PG10). We can calculate the diff set of the matview by replacing the base table in the view definition query with the ENR (at least if it is Selection-Projection -Join view). As well as view definition time, GROUP BY and count(*) is added in order to count the duplicity of tuples in the diff set. As a result, two diff sets (to be deleted from and to be inserted into the matview) are calculated, and the results are stored into temporary tables respectively. The matiview is updated by merging these change sets. Instead of executing DELETE or INSERT simply, the values of __ivm_count__ column in the matview is decreased or increased. When the values becomes zero, the corresponding tuple is deleted from the matview. 3. Access to matview When SELECT is issued for IVM matviews defined with DISTINCT, all columns except to __ivm_count__ of each tuple in the matview are returned. This is correct because duplicity of tuples are eliminated by GROUP BY. When DISTINCT is not used, SELECT for the IVM matviews returns each tuple __ivm_count__ times. Currently, this is implemented by rewriting the SELECT query to replace the matview RTE with a subquery which joins the matview and generate_series function as bellow. SELECT mv.* FROM mv, generate_series(1, mv.__ivm_count__); __ivm_count__ column is invisible for users when "SELECT * FROM ..." is issued, but users can see the value by specifying in target list explicitly. ====== Example 3 ====== postgres=# \d+ m1 Materialized view "public.m1" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description ---------------+--------+-----------+----------+---------+----------+--------------+------------- t | text | | | | extended | | __ivm_count__ | bigint | | | | plain | | View definition: SELECT t1.t FROM t1; Access method: heap postgres=# \d+ m2 Materialized view "public.m2" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description ---------------+--------+-----------+----------+---------+----------+--------------+------------- t | text | | | | extended | | __ivm_count__ | bigint | | | | plain | | View definition: SELECT DISTINCT t1.t FROM t1; Access method: heap postgres=# SELECT *, __ivm_count__ FROM m1; t | __ivm_count__ ---+--------------- B | 2 B | 2 A | 1 (3 rows) postgres=# SELECT *, __ivm_count__ FROM m2; t | __ivm_count__ ---+--------------- B | 2 A | 1 (2 rows) postgres=# EXPLAIN SELECT * FROM m1; QUERY PLAN ------------------------------------------------------------------------------ Nested Loop (cost=0.00..61.03 rows=3000 width=2) -> Seq Scan on m1 mv (cost=0.00..1.03 rows=3 width=10) -> Function Scan on generate_series (cost=0.00..10.00 rows=1000 width=0) (3 rows) ============================= * Simple Performance Evaluation I confirmed that "incremental" update of matviews is more effective than the standard REFRESH by using simple exapmle. I used tables of pgbench (SF=100) here. Create two matviews, that is, without and with IVM. test=# CREATE MATERIALIZED VIEW bench1 AS SELECT aid, bid, abalance, bbalance FROM pgbench_accounts JOIN pgbench_branches USING (bid) WHERE abalance > 0 OR bbalance > 0; SELECT 5001054 test=# CREATE INCREMENTAL MATERIALIZED VIEW bench2 AS SELECT aid, bid, abalance, bbalance FROM pgbench_accounts JOIN pgbench_branches USING (bid) WHERE abalance > 0 OR bbalance > 0; SELECT 5001054 The standard REFRESH of bench1 took more than 10 seconds. test=# \timing Timing is on. test=# REFRESH MATERIALIZED VIEW bench1 ; REFRESH MATERIALIZED VIEW Time: 11210.563 ms (00:11.211) Create an index on the IVM matview (bench2). test=# CREATE INDEX on bench2(aid,bid); CREATE INDEX Updating a tuple in pgbench_accounts took 18ms. After this, bench2 was updated automatically and correctly. test=# SELECT * FROM bench2 WHERE aid = 1; aid | bid | abalance | bbalance -----+-----+----------+---------- 1 | 1 | 10 | 10 (1 row) Time: 2.498 ms test=# UPDATE pgbench_accounts SET abalance = 1000 WHERE aid = 1; UPDATE 1 Time: 18.634 ms test=# SELECT * FROM bench2 WHERE aid = 1; aid | bid | abalance | bbalance -----+-----+----------+---------- 1 | 1 | 1000 | 10 (1 row) However, if there is not the index on bench2, it took 4 sec, so appropriate indexes are needed on IVM matviews. test=# DROP INDEX bench2_aid_bid_idx ; DROP INDEX Time: 10.613 ms test=# UPDATE pgbench_accounts SET abalance = 2000 WHERE aid = 1; UPDATE 1 Time: 3931.274 ms (00:03.931) * Restrictions on view definition This patch is still in Work-in-Progress and there are many restrictions on the view definition query of matviews. The current implementation supports views including selection, projection, and inner join with or without DISTINCT. Aggregation and GROUP BY are not supported yet, but I plan to deal with these by the first release. Self-join, subqueries, OUTER JOIN, CTE, window functions are not considered well, either. I need more investigation on these type of views although I found some papers explaining how to handle sub-queries and outer-joins. These unsupported views should be checked when a matview is created, but this is not implemented yet. Hoshiai-san are working on this. * Timing of view maintenance This patch implements a kind of Immediate Maintenance, that is, a matview is updated immediately when a base table is modified. On other hand, in "Deferred Maintenance", matviews are updated after the transaction, for example, by the user command like REFRESH. For implementing "deferred", it is need to implement a mechanism to maintain logs for recording changes of base tables and an algorithm to compute the delta to be applied to matviews. In addition, there could be another implementation of Immediate Maintenance in which matview is updated at the end of a transaction that modified base table, rather than in AFTER trigger. Oracle supports this type of IVM. To implement this, we will need a mechanism to maintain change logs on base tables as well as Deferred maintenance. * Counting algorithm implementation There will be also discussions on counting-algorithm implementation. Firstly, the current patch treats "__ivm_count__" as a special column name in a somewhat ad hoc way. This is used when maintaining and accessing matviews, and when "SELECT * FROM ..." is issued, __ivm_count__ column is invisible for users. Maybe this name has to be inhibited in user tables. Is it acceptable to use such columns for IVM, and is there better way, if not? Secondly, a matview with duplicate tuples is replaces with a subquery which uses generate_series function. It does not have to be generate_series, and we can make a new set returning function for this. Anyway, this internal behaviour is visible in EXPLAIN results as shown in Example 3. Also, there is a performance impact because estimated rows number is wrong, and what is worse, the cost of join is not small when the size of matview is large. Therefore, we might have to add a new plan node for selecting from matviews rather than using such a special set returning function. Ragards, -- Yugo Nagata <nagata@sraoss.co.jp>
Attachment
Hi hackers, Thank you for your many questions and feedbacks at PGCon 2019. Attached is the patch rebased for the current master branch. Regards, Yugo Nagata On Tue, 14 May 2019 15:46:48 +0900 Yugo Nagata <nagata@sraoss.co.jp> wrote: > On Mon, 1 Apr 2019 12:11:22 +0900 > Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > On Thu, 27 Dec 2018 21:57:26 +0900 > > Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > > > Hi, > > > > > > I would like to implement Incremental View Maintenance (IVM) on PostgreSQL. > > > > I am now working on an initial patch for implementing IVM on PostgreSQL. > > This enables materialized views to be updated incrementally after one > > of their base tables is modified. > > Attached is a WIP patch of Incremental View Maintenance (IVM). > Major part is written by me, and changes in syntax and pg_class > are Hoshiai-san's work. > > Although this is sill a draft patch in work-in-progress, any > suggestions or thoughts would be appreciated. > > * What it is > > This allows a kind of Immediate Maintenance of materialized views. if a > materialized view is created by CRATE INCREMENTAL MATERIALIZED VIEW command, > the contents of the mateview is updated automatically and incrementally > after base tables are updated. Noted this syntax is just tentative, so it > may be changed. > > ====== Example 1 ====== > postgres=# CREATE INCREMENTAL MATERIALIZED VIEW m AS SELECT * FROM t0; > SELECT 3 > postgres=# SELECT * FROM m; > i > --- > 3 > 2 > 1 > (3 rows) > > postgres=# INSERT INTO t0 VALUES (4); > INSERT 0 1 > postgres=# SELECt * FROM m; -- automatically updated > i > --- > 3 > 2 > 1 > 4 > (4 rows) > ============================= > > This implementation also supports matviews including duplicate tuples or > DISTINCT clause in its view definition query. For example, even if a matview > is defined with DISTINCT to remove duplication of tuples in a base table, this > can perform incremental update of the matview properly. That is, the contents > of the matview doesn't change when exiting tuples are inserted into the base > tables, and a tuple in the matview is deleted only when duplicity of the > corresponding tuple in the base table becomes zero. > > This is due to "colunting alogorithm" in which the number of each tuple is > stored in matviews as a special column value. > > ====== Example 2 ====== > postgres=# SELECT * FROM t1; > id | t > ----+--- > 1 | A > 2 | B > 3 | C > 4 | A > (4 rows) > > postgres=# CREATE INCREMENTAL MATERIALIZED VIEW m1 AS SELECT t FROM t1; > SELECT 3 > postgres=# CREATE INCREMENTAL MATERIALIZED VIEW m2 AS SELECT DISTINCT t FROM t1; > SELECT 3 > postgres=# SELECT * FROM m1; -- with duplicity > t > --- > A > A > C > B > (4 rows) > > postgres=# SELECT * FROM m2; > t > --- > A > B > C > (3 rows) > > postgres=# INSERT INTO t1 VALUES (5, 'B'); > INSERT 0 1 > postgres=# DELETE FROM t1 WHERE id IN (1,3); -- delete (1,A),(3,C) > DELETE 2 > postgres=# SELECT * FROM m1; -- one A left and one more B > t > --- > B > B > A > (3 rows) > > postgres=# SELECT * FROM m2; -- only C is removed > t > --- > B > A > (2 rows) > ============================= > > * How it works > > 1. Creating matview > > When a matview is created, AFTER triggers are internally created > on its base tables. When the base tables is modified (INSERT, DELETE, > UPDATE), the matview is updated incrementally in the trigger function. > > When populating the matview, GROUP BY and count(*) are added to the > view definition query before this is executed for counting duplicity > of tuples in the matview. The result of count is stored in the matview > as a special column named "__ivm_count__". > > 2. Maintenance of matview > > When base tables are modified, the change set of the table can be > referred as Ephemeral Named Relations (ENRs) thanks to Transition Table > (a feature of trigger implemented since PG10). We can calculate the diff > set of the matview by replacing the base table in the view definition > query with the ENR (at least if it is Selection-Projection -Join view). > As well as view definition time, GROUP BY and count(*) is added in order > to count the duplicity of tuples in the diff set. As a result, two diff > sets (to be deleted from and to be inserted into the matview) are > calculated, and the results are stored into temporary tables respectively. > > The matiview is updated by merging these change sets. Instead of executing > DELETE or INSERT simply, the values of __ivm_count__ column in the matview > is decreased or increased. When the values becomes zero, the corresponding > tuple is deleted from the matview. > > 3. Access to matview > > When SELECT is issued for IVM matviews defined with DISTINCT, all columns > except to __ivm_count__ of each tuple in the matview are returned. This is > correct because duplicity of tuples are eliminated by GROUP BY. > > When DISTINCT is not used, SELECT for the IVM matviews returns each tuple > __ivm_count__ times. Currently, this is implemented by rewriting the SELECT > query to replace the matview RTE with a subquery which joins the matview > and generate_series function as bellow. > > SELECT mv.* FROM mv, generate_series(1, mv.__ivm_count__); > > __ivm_count__ column is invisible for users when "SELECT * FROM ..." is > issued, but users can see the value by specifying in target list explicitly. > > ====== Example 3 ====== > postgres=# \d+ m1 > Materialized view "public.m1" > Column | Type | Collation | Nullable | Default | Storage | Stats target | Description > ---------------+--------+-----------+----------+---------+----------+--------------+------------- > t | text | | | | extended | | > __ivm_count__ | bigint | | | | plain | | > View definition: > SELECT t1.t > FROM t1; > Access method: heap > > postgres=# \d+ m2 > Materialized view "public.m2" > Column | Type | Collation | Nullable | Default | Storage | Stats target | Description > ---------------+--------+-----------+----------+---------+----------+--------------+------------- > t | text | | | | extended | | > __ivm_count__ | bigint | | | | plain | | > View definition: > SELECT DISTINCT t1.t > FROM t1; > Access method: heap > > postgres=# SELECT *, __ivm_count__ FROM m1; > t | __ivm_count__ > ---+--------------- > B | 2 > B | 2 > A | 1 > (3 rows) > > postgres=# SELECT *, __ivm_count__ FROM m2; > t | __ivm_count__ > ---+--------------- > B | 2 > A | 1 > (2 rows) > > postgres=# EXPLAIN SELECT * FROM m1; > QUERY PLAN > ------------------------------------------------------------------------------ > Nested Loop (cost=0.00..61.03 rows=3000 width=2) > -> Seq Scan on m1 mv (cost=0.00..1.03 rows=3 width=10) > -> Function Scan on generate_series (cost=0.00..10.00 rows=1000 width=0) > (3 rows) > ============================= > > * Simple Performance Evaluation > > I confirmed that "incremental" update of matviews is more effective > than the standard REFRESH by using simple exapmle. I used tables > of pgbench (SF=100) here. > > Create two matviews, that is, without and with IVM. > > test=# CREATE MATERIALIZED VIEW bench1 AS > SELECT aid, bid, abalance, bbalance > FROM pgbench_accounts JOIN pgbench_branches USING (bid) > WHERE abalance > 0 OR bbalance > 0; > SELECT 5001054 > test=# CREATE INCREMENTAL MATERIALIZED VIEW bench2 AS > SELECT aid, bid, abalance, bbalance > FROM pgbench_accounts JOIN pgbench_branches USING (bid) > WHERE abalance > 0 OR bbalance > 0; > SELECT 5001054 > > The standard REFRESH of bench1 took more than 10 seconds. > > test=# \timing > Timing is on. > test=# REFRESH MATERIALIZED VIEW bench1 ; > REFRESH MATERIALIZED VIEW > Time: 11210.563 ms (00:11.211) > > Create an index on the IVM matview (bench2). > > test=# CREATE INDEX on bench2(aid,bid); > CREATE INDEX > > Updating a tuple in pgbench_accounts took 18ms. After this, bench2 > was updated automatically and correctly. > > test=# SELECT * FROM bench2 WHERE aid = 1; > aid | bid | abalance | bbalance > -----+-----+----------+---------- > 1 | 1 | 10 | 10 > (1 row) > > Time: 2.498 ms > test=# UPDATE pgbench_accounts SET abalance = 1000 WHERE aid = 1; > UPDATE 1 > Time: 18.634 ms > test=# SELECT * FROM bench2 WHERE aid = 1; > aid | bid | abalance | bbalance > -----+-----+----------+---------- > 1 | 1 | 1000 | 10 > (1 row) > > However, if there is not the index on bench2, it took 4 sec, so > appropriate indexes are needed on IVM matviews. > > test=# DROP INDEX bench2_aid_bid_idx ; > DROP INDEX > Time: 10.613 ms > test=# UPDATE pgbench_accounts SET abalance = 2000 WHERE aid = 1; > UPDATE 1 > Time: 3931.274 ms (00:03.931) > > * Restrictions on view definition > > This patch is still in Work-in-Progress and there are many restrictions > on the view definition query of matviews. > > The current implementation supports views including selection, projection, > and inner join with or without DISTINCT. Aggregation and GROUP BY are not > supported yet, but I plan to deal with these by the first release. > Self-join, subqueries, OUTER JOIN, CTE, window functions are not > considered well, either. I need more investigation on these type of views > although I found some papers explaining how to handle sub-queries and > outer-joins. > > These unsupported views should be checked when a matview is created, but > this is not implemented yet. Hoshiai-san are working on this. > > * Timing of view maintenance > > This patch implements a kind of Immediate Maintenance, that is, a matview > is updated immediately when a base table is modified. On other hand, in > "Deferred Maintenance", matviews are updated after the transaction, for > example, by the user command like REFRESH. > > For implementing "deferred", it is need to implement a mechanism to maintain > logs for recording changes of base tables and an algorithm to compute the > delta to be applied to matviews. > > In addition, there could be another implementation of Immediate Maintenance > in which matview is updated at the end of a transaction that modified base > table, rather than in AFTER trigger. Oracle supports this type of IVM. To > implement this, we will need a mechanism to maintain change logs on base > tables as well as Deferred maintenance. > > * Counting algorithm implementation > > There will be also discussions on counting-algorithm implementation. > Firstly, the current patch treats "__ivm_count__" as a special column name > in a somewhat ad hoc way. This is used when maintaining and accessing matviews, > and when "SELECT * FROM ..." is issued, __ivm_count__ column is invisible for > users. Maybe this name has to be inhibited in user tables. Is it acceptable > to use such columns for IVM, and is there better way, if not? > > Secondly, a matview with duplicate tuples is replaces with a subquery which > uses generate_series function. It does not have to be generate_series, and we > can make a new set returning function for this. Anyway, this internal behaviour > is visible in EXPLAIN results as shown in Example 3. Also, there is a > performance impact because estimated rows number is wrong, and what is worse, > the cost of join is not small when the size of matview is large. Therefore, we > might have to add a new plan node for selecting from matviews rather than using > such a special set returning function. > > > Ragards, > -- > Yugo Nagata <nagata@sraoss.co.jp> -- Yugo Nagata <nagata@sraoss.co.jp>
Attachment
Hi Yugo, I'd like to compare the performance of your MV refresh algorithm versus an approach that logs changes into an mv log table, and can then apply the changes at some later point in time. I'd like to handle the materialized join view (mjv) case first, specifically a 2-way left outer join, with a UDF in the SELECT list of the mjv. Does your refresh algorithm handle mjv's with connected join graphs that consist entirely of inner and left outer joins? If so, I'd like to measure the overhead of your refresh algorithm on pgbench, modified to include an mjv, versus a (hand coded) incremental maintenance algorithm that uses mv log tables populated by ordinary triggers. We may also want to look at capturing the deltas using logical replication, which ought to be faster than a trigger-based solution. I have someone available to do the performance testing for another 2 months, so if you can connect with me off-list to coordinate, we can set up the performance experiments and run them on our AWS clusters. best regards, /Jim F ----- Jim Finnerty, AWS, Amazon Aurora PostgreSQL -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Hi Jim, On Fri, 21 Jun 2019 08:41:11 -0700 (MST) Jim Finnerty <jfinnert@amazon.com> wrote: > Hi Yugo, > > I'd like to compare the performance of your MV refresh algorithm versus > an approach that logs changes into an mv log table, and can then apply the > changes at some later point in time. I'd like to handle the materialized > join view (mjv) case first, specifically a 2-way left outer join, with a UDF > in the SELECT list of the mjv. Do you mean you have your implementation of IVM that using log tables? I'm so interested in this, and I would appreciate it if you explain the detail. > Does your refresh algorithm handle mjv's with connected join graphs that > consist entirely of inner and left outer joins? > If so, I'd like to measure the overhead of your refresh algorithm on > pgbench, modified to include an mjv, versus a (hand coded) incremental > maintenance algorithm that uses mv log tables populated by ordinary > triggers. We may also want to look at capturing the deltas using logical > replication, which ought to be faster than a trigger-based solution. In the current our implementation, outer joins is not yet supported though we plan to handle this in future. So,we would not be able to compare these directly in the same workload in the current status. However, the current our implementation supports only the way to update materialized views in a trigger, and the performance of modifying base tables will be lower than the approach which uses log tables. This is because queries to update materialized views are issued in the trigger. This is not only a overhead itself, but also takes a lock on a materialized view, which has an ] impact on concurrent execution performance. In the previous our PoC, we implemented IVM using log tables, in which logs are captured by triggers and materialized views are update incrementally by a user command[1]. However, to implement log table approach, we need a infrastructure to maintain these logs. For example, which logs are necessary and which logs can be discarded, etc. We thought this is not trivial work, so we decided to start from the current approach which doesn't use log tables. We are now preparing to implement this in the next step because this is also needed to support deferred maintenance of views. [1] https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/ I agree that capturing the deltas using logical decoding will be faster than using a trigger although we haven't yet consider this well. Best regadrds, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
Hi, Attached is a WIP patch of IVM which supports some aggregate functions. Currently, only count and sum are supported. Avg, min, or max is not supported although I think supporting this would not be so hard. As a restriction, expressions specified in GROUP BY must appear in the target list of views because tuples to be updated in MV are identified by using this group keys. In the case of views without aggregate functions, only the number of tuple duplicates (__ivm_count__) are updated at incremental maintenance. On the other hand, in the case of vies with aggregations, the aggregated values are also updated. The way of update depends the kind of aggregate function. In the case of sum (or agg functions except to count), NULL in input values is ignored, and this returns a null value when no rows are selected. To support this specification, the number of not-NULL input values is counted and stored in MV as a hidden column whose name is like __ivm_count_sum__, for example. In the case of count, this returns zero when no rows are selected, and count(*) doesn't ignore NULL input. These specification are also supported. Tuples to be updated in MV are identified by using keys specified by GROUP BY clause. However, in the case of aggregation without GROUP BY, there is only one tuple in the view, so keys are not uses to identify tuples. In addition, a race condition which occurred in the previous version is prevented in this patch. In the previous version, when two translocations change a base tables concurrently, an anormal update of MV was possible because a change in one transaction was not visible for another transaction even in READ COMMITTED level. To prevent this, I fix this to take a lock in early stage of view maintenance to wait for concurrent transactions which are updating the same MV end. Also, we have to get the latest snapshot before computting delta tables because any changes which occurs in other transaction during lock waiting is not visible even in READ COMMITTED level. In REPEATABLE READ or SERIALIZABLE level, don't wait a lock, and raise an error immediately to prevent anormal update. These solutions might be ugly, but something to prevent anormal update is anyway necessary. There may be better way. Moreover, some regression test are added for aggregate functions support. This is Hoshiai-san's work. Although the code is not refined yet and I will need a deal of refactoring and reorganizing, I submitted this to share the current status. * Exapmle (from regression test) ======================================================================= (1) creating tables CREATE TABLE mv_base_a (i int, j int); INSERT INTO mv_base_a VALUES (1,10), (2,20), (3,30), (4,40), (5,50); (2) Views sith SUM() and COUNT() aggregation function BEGIN; CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_agg AS SELECT i, SUM(j), COUNT(i) FROM mv_base_a GROUP BY i; SELECT * FROM mv_ivm_agg ORDER BY 1,2,3; i | sum | count ---+-----+------- 1 | 10 | 1 2 | 20 | 1 3 | 30 | 1 4 | 40 | 1 5 | 50 | 1 (5 rows) INSERT INTO mv_base_a VALUES(2,100); SELECT * FROM mv_ivm_agg ORDER BY 1,2,3; i | sum | count ---+-----+------- 1 | 10 | 1 2 | 120 | 2 3 | 30 | 1 4 | 40 | 1 5 | 50 | 1 (5 rows) UPDATE mv_base_a SET j = 200 WHERE (i,j) = (2,100); SELECT * FROM mv_ivm_agg ORDER BY 1,2,3; i | sum | count ---+-----+------- 1 | 10 | 1 2 | 220 | 2 3 | 30 | 1 4 | 40 | 1 5 | 50 | 1 (5 rows) DELETE FROM mv_base_a WHERE (i,j) = (2,200); SELECT * FROM mv_ivm_agg ORDER BY 1,2,3; i | sum | count ---+-----+------- 1 | 10 | 1 2 | 20 | 1 3 | 30 | 1 4 | 40 | 1 5 | 50 | 1 (5 rows) ROLLBACK; (3) Views with COUNT(*) aggregation function BEGIN; CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_agg AS SELECT i, SUM(j),COUNT(*) FROM mv_base_a GROUP BY i; SELECT * FROM mv_ivm_agg ORDER BY 1,2,3; i | sum | count ---+-----+------- 1 | 10 | 1 2 | 20 | 1 3 | 30 | 1 4 | 40 | 1 5 | 50 | 1 (5 rows) INSERT INTO mv_base_a VALUES(2,100); SELECT * FROM mv_ivm_agg ORDER BY 1,2,3; i | sum | count ---+-----+------- 1 | 10 | 1 2 | 120 | 2 3 | 30 | 1 4 | 40 | 1 5 | 50 | 1 (5 rows) ROLLBACK; (4) Views with aggregation function without GROUP clause BEGIN; CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_group AS SELECT SUM(j)FROM mv_base_a; SELECT * FROM mv_ivm_group ORDER BY 1; sum ----- 150 (1 row) INSERT INTO mv_base_a VALUES(6,20); SELECT * FROM mv_ivm_group ORDER BY 1; sum ----- 170 (1 row) ======================================================================= On Thu, 20 Jun 2019 16:44:10 +0900 Yugo Nagata <nagata@sraoss.co.jp> wrote: > Hi hackers, > > Thank you for your many questions and feedbacks at PGCon 2019. > Attached is the patch rebased for the current master branch. > > Regards, > Yugo Nagata > > On Tue, 14 May 2019 15:46:48 +0900 > Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > On Mon, 1 Apr 2019 12:11:22 +0900 > > Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > > > On Thu, 27 Dec 2018 21:57:26 +0900 > > > Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > > > > > Hi, > > > > > > > > I would like to implement Incremental View Maintenance (IVM) on PostgreSQL. > > > > > > I am now working on an initial patch for implementing IVM on PostgreSQL. > > > This enables materialized views to be updated incrementally after one > > > of their base tables is modified. > > > > Attached is a WIP patch of Incremental View Maintenance (IVM). > > Major part is written by me, and changes in syntax and pg_class > > are Hoshiai-san's work. > > > > Although this is sill a draft patch in work-in-progress, any > > suggestions or thoughts would be appreciated. > > > > * What it is > > > > This allows a kind of Immediate Maintenance of materialized views. if a > > materialized view is created by CRATE INCREMENTAL MATERIALIZED VIEW command, > > the contents of the mateview is updated automatically and incrementally > > after base tables are updated. Noted this syntax is just tentative, so it > > may be changed. > > > > ====== Example 1 ====== > > postgres=# CREATE INCREMENTAL MATERIALIZED VIEW m AS SELECT * FROM t0; > > SELECT 3 > > postgres=# SELECT * FROM m; > > i > > --- > > 3 > > 2 > > 1 > > (3 rows) > > > > postgres=# INSERT INTO t0 VALUES (4); > > INSERT 0 1 > > postgres=# SELECt * FROM m; -- automatically updated > > i > > --- > > 3 > > 2 > > 1 > > 4 > > (4 rows) > > ============================= > > > > This implementation also supports matviews including duplicate tuples or > > DISTINCT clause in its view definition query. For example, even if a matview > > is defined with DISTINCT to remove duplication of tuples in a base table, this > > can perform incremental update of the matview properly. That is, the contents > > of the matview doesn't change when exiting tuples are inserted into the base > > tables, and a tuple in the matview is deleted only when duplicity of the > > corresponding tuple in the base table becomes zero. > > > > This is due to "colunting alogorithm" in which the number of each tuple is > > stored in matviews as a special column value. > > > > ====== Example 2 ====== > > postgres=# SELECT * FROM t1; > > id | t > > ----+--- > > 1 | A > > 2 | B > > 3 | C > > 4 | A > > (4 rows) > > > > postgres=# CREATE INCREMENTAL MATERIALIZED VIEW m1 AS SELECT t FROM t1; > > SELECT 3 > > postgres=# CREATE INCREMENTAL MATERIALIZED VIEW m2 AS SELECT DISTINCT t FROM t1; > > SELECT 3 > > postgres=# SELECT * FROM m1; -- with duplicity > > t > > --- > > A > > A > > C > > B > > (4 rows) > > > > postgres=# SELECT * FROM m2; > > t > > --- > > A > > B > > C > > (3 rows) > > > > postgres=# INSERT INTO t1 VALUES (5, 'B'); > > INSERT 0 1 > > postgres=# DELETE FROM t1 WHERE id IN (1,3); -- delete (1,A),(3,C) > > DELETE 2 > > postgres=# SELECT * FROM m1; -- one A left and one more B > > t > > --- > > B > > B > > A > > (3 rows) > > > > postgres=# SELECT * FROM m2; -- only C is removed > > t > > --- > > B > > A > > (2 rows) > > ============================= > > > > * How it works > > > > 1. Creating matview > > > > When a matview is created, AFTER triggers are internally created > > on its base tables. When the base tables is modified (INSERT, DELETE, > > UPDATE), the matview is updated incrementally in the trigger function. > > > > When populating the matview, GROUP BY and count(*) are added to the > > view definition query before this is executed for counting duplicity > > of tuples in the matview. The result of count is stored in the matview > > as a special column named "__ivm_count__". > > > > 2. Maintenance of matview > > > > When base tables are modified, the change set of the table can be > > referred as Ephemeral Named Relations (ENRs) thanks to Transition Table > > (a feature of trigger implemented since PG10). We can calculate the diff > > set of the matview by replacing the base table in the view definition > > query with the ENR (at least if it is Selection-Projection -Join view). > > As well as view definition time, GROUP BY and count(*) is added in order > > to count the duplicity of tuples in the diff set. As a result, two diff > > sets (to be deleted from and to be inserted into the matview) are > > calculated, and the results are stored into temporary tables respectively. > > > > The matiview is updated by merging these change sets. Instead of executing > > DELETE or INSERT simply, the values of __ivm_count__ column in the matview > > is decreased or increased. When the values becomes zero, the corresponding > > tuple is deleted from the matview. > > > > 3. Access to matview > > > > When SELECT is issued for IVM matviews defined with DISTINCT, all columns > > except to __ivm_count__ of each tuple in the matview are returned. This is > > correct because duplicity of tuples are eliminated by GROUP BY. > > > > When DISTINCT is not used, SELECT for the IVM matviews returns each tuple > > __ivm_count__ times. Currently, this is implemented by rewriting the SELECT > > query to replace the matview RTE with a subquery which joins the matview > > and generate_series function as bellow. > > > > SELECT mv.* FROM mv, generate_series(1, mv.__ivm_count__); > > > > __ivm_count__ column is invisible for users when "SELECT * FROM ..." is > > issued, but users can see the value by specifying in target list explicitly. > > > > ====== Example 3 ====== > > postgres=# \d+ m1 > > Materialized view "public.m1" > > Column | Type | Collation | Nullable | Default | Storage | Stats target | Description > > ---------------+--------+-----------+----------+---------+----------+--------------+------------- > > t | text | | | | extended | | > > __ivm_count__ | bigint | | | | plain | | > > View definition: > > SELECT t1.t > > FROM t1; > > Access method: heap > > > > postgres=# \d+ m2 > > Materialized view "public.m2" > > Column | Type | Collation | Nullable | Default | Storage | Stats target | Description > > ---------------+--------+-----------+----------+---------+----------+--------------+------------- > > t | text | | | | extended | | > > __ivm_count__ | bigint | | | | plain | | > > View definition: > > SELECT DISTINCT t1.t > > FROM t1; > > Access method: heap > > > > postgres=# SELECT *, __ivm_count__ FROM m1; > > t | __ivm_count__ > > ---+--------------- > > B | 2 > > B | 2 > > A | 1 > > (3 rows) > > > > postgres=# SELECT *, __ivm_count__ FROM m2; > > t | __ivm_count__ > > ---+--------------- > > B | 2 > > A | 1 > > (2 rows) > > > > postgres=# EXPLAIN SELECT * FROM m1; > > QUERY PLAN > > ------------------------------------------------------------------------------ > > Nested Loop (cost=0.00..61.03 rows=3000 width=2) > > -> Seq Scan on m1 mv (cost=0.00..1.03 rows=3 width=10) > > -> Function Scan on generate_series (cost=0.00..10.00 rows=1000 width=0) > > (3 rows) > > ============================= > > > > * Simple Performance Evaluation > > > > I confirmed that "incremental" update of matviews is more effective > > than the standard REFRESH by using simple exapmle. I used tables > > of pgbench (SF=100) here. > > > > Create two matviews, that is, without and with IVM. > > > > test=# CREATE MATERIALIZED VIEW bench1 AS > > SELECT aid, bid, abalance, bbalance > > FROM pgbench_accounts JOIN pgbench_branches USING (bid) > > WHERE abalance > 0 OR bbalance > 0; > > SELECT 5001054 > > test=# CREATE INCREMENTAL MATERIALIZED VIEW bench2 AS > > SELECT aid, bid, abalance, bbalance > > FROM pgbench_accounts JOIN pgbench_branches USING (bid) > > WHERE abalance > 0 OR bbalance > 0; > > SELECT 5001054 > > > > The standard REFRESH of bench1 took more than 10 seconds. > > > > test=# \timing > > Timing is on. > > test=# REFRESH MATERIALIZED VIEW bench1 ; > > REFRESH MATERIALIZED VIEW > > Time: 11210.563 ms (00:11.211) > > > > Create an index on the IVM matview (bench2). > > > > test=# CREATE INDEX on bench2(aid,bid); > > CREATE INDEX > > > > Updating a tuple in pgbench_accounts took 18ms. After this, bench2 > > was updated automatically and correctly. > > > > test=# SELECT * FROM bench2 WHERE aid = 1; > > aid | bid | abalance | bbalance > > -----+-----+----------+---------- > > 1 | 1 | 10 | 10 > > (1 row) > > > > Time: 2.498 ms > > test=# UPDATE pgbench_accounts SET abalance = 1000 WHERE aid = 1; > > UPDATE 1 > > Time: 18.634 ms > > test=# SELECT * FROM bench2 WHERE aid = 1; > > aid | bid | abalance | bbalance > > -----+-----+----------+---------- > > 1 | 1 | 1000 | 10 > > (1 row) > > > > However, if there is not the index on bench2, it took 4 sec, so > > appropriate indexes are needed on IVM matviews. > > > > test=# DROP INDEX bench2_aid_bid_idx ; > > DROP INDEX > > Time: 10.613 ms > > test=# UPDATE pgbench_accounts SET abalance = 2000 WHERE aid = 1; > > UPDATE 1 > > Time: 3931.274 ms (00:03.931) > > > > * Restrictions on view definition > > > > This patch is still in Work-in-Progress and there are many restrictions > > on the view definition query of matviews. > > > > The current implementation supports views including selection, projection, > > and inner join with or without DISTINCT. Aggregation and GROUP BY are not > > supported yet, but I plan to deal with these by the first release. > > Self-join, subqueries, OUTER JOIN, CTE, window functions are not > > considered well, either. I need more investigation on these type of views > > although I found some papers explaining how to handle sub-queries and > > outer-joins. > > > > These unsupported views should be checked when a matview is created, but > > this is not implemented yet. Hoshiai-san are working on this. > > > > * Timing of view maintenance > > > > This patch implements a kind of Immediate Maintenance, that is, a matview > > is updated immediately when a base table is modified. On other hand, in > > "Deferred Maintenance", matviews are updated after the transaction, for > > example, by the user command like REFRESH. > > > > For implementing "deferred", it is need to implement a mechanism to maintain > > logs for recording changes of base tables and an algorithm to compute the > > delta to be applied to matviews. > > > > In addition, there could be another implementation of Immediate Maintenance > > in which matview is updated at the end of a transaction that modified base > > table, rather than in AFTER trigger. Oracle supports this type of IVM. To > > implement this, we will need a mechanism to maintain change logs on base > > tables as well as Deferred maintenance. > > > > * Counting algorithm implementation > > > > There will be also discussions on counting-algorithm implementation. > > Firstly, the current patch treats "__ivm_count__" as a special column name > > in a somewhat ad hoc way. This is used when maintaining and accessing matviews, > > and when "SELECT * FROM ..." is issued, __ivm_count__ column is invisible for > > users. Maybe this name has to be inhibited in user tables. Is it acceptable > > to use such columns for IVM, and is there better way, if not? > > > > Secondly, a matview with duplicate tuples is replaces with a subquery which > > uses generate_series function. It does not have to be generate_series, and we > > can make a new set returning function for this. Anyway, this internal behaviour > > is visible in EXPLAIN results as shown in Example 3. Also, there is a > > performance impact because estimated rows number is wrong, and what is worse, > > the cost of join is not small when the size of matview is large. Therefore, we > > might have to add a new plan node for selecting from matviews rather than using > > such a special set returning function. > > > > > > Ragards, > > -- > > Yugo Nagata <nagata@sraoss.co.jp> > > > -- > Yugo Nagata <nagata@sraoss.co.jp> -- Yugo Nagata <nagata@sraoss.co.jp>
Attachment
Hi Greg, On Wed, 3 Apr 2019 17:41:36 -0400 Greg Stark <stark@mit.edu> wrote: > On Sun, 31 Mar 2019 at 23:22, Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > > Firstly, this will handle simple definition views which includes only > > selection, projection, and join. Standard aggregations (count, sum, avg, > > min, max) are not planned to be implemented in the first patch, but these > > are commonly used in materialized views, so I'll implement them later on. > > It's fine to not have all the features from day 1 of course. But I > just picked up this comment and the followup talking about splitting > AVG into SUM and COUNT and I had a comment. When you do look at > tackling aggregates I don't think you should restrict yourself to > these specific standard aggregations. We have all the necessary > abstractions to handle all aggregations that are feasible, see > https://www.postgresql.org/docs/devel/xaggr.html#XAGGR-MOVING-AGGREGATES > > What you need to do -- I think -- is store the "moving aggregate > state" before the final function. Then whenever a row is inserted or > deleted or updated (or whenever another column is updated which causes > the value to row to enter or leave the aggregation) apply either > aggtransfn or aggminvtransfn to the state. I'm not sure if you want to > apply the final function on every update or only lazily either may be > better in some usage. Thank you for your suggestion! I submitted the latest patch just now supporting some aggregate functions, but this supports only sum and count, and lacking a kind of generalization. However, I would like to refine this to support more general aggregate functions. I think your suggestions is helpful for me to do this. Thank you! Best regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
On Fri, Jun 28, 2019 at 10:56 PM Yugo Nagata <nagata@sraoss.co.jp> wrote: > Attached is a WIP patch of IVM which supports some aggregate functions. Hi Nagata-san and Hoshiai-san, Thank you for working on this. I enjoyed your talk at PGCon. I've added Kevin Grittner just in case he missed this thread; he has talked often about implementing the counting algorithm, and he wrote the "trigger transition tables" feature to support exactly this. While integrating trigger transition tables with the new partition features, we had to make a number of decisions about how that should work, and we tried to come up with answers that would work for IMV, and I hope we made the right choices! I am quite interested to learn how IVM interacts with SERIALIZABLE. A couple of superficial review comments: + const char *aggname = get_func_name(aggref->aggfnoid); ... + else if (!strcmp(aggname, "sum")) I guess you need a more robust way to detect the supported aggregates than their name, or I guess some way for aggregates themselves to specify that they support this and somehow supply the extra logic. Perhaps I just waid what Greg Stark already said, except not as well. + elog(ERROR, "Aggrege function %s is not supported", aggname); s/Aggrege/aggregate/ Of course it is not helpful to comment on typos at this early stage, it's just that this one appears many times in the test output :-) +static bool +isIvmColumn(const char *s) +{ + char pre[7]; + + strlcpy(pre, s, sizeof(pre)); + return (strcmp(pre, "__ivm_") == 0); +} What about strncmp(s, "__ivm_", 6) == 0)? As for the question of how to reserve a namespace for system columns that won't clash with user columns, according to our manual the SQL standard doesn't allow $ in identifier names, and according to my copy SQL92 "intermediate SQL" doesn't allow identifiers that end in an underscore. I don't know what the best answer is but we should probably decide on a something based the standard. As for how to make internal columns invisible to SELECT *, previously there have been discussions about doing that using a new flag in pg_attribute: https://www.postgresql.org/message-id/flat/CAEepm%3D3ZHh%3Dp0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA%40mail.gmail.com + "WITH t AS (" + " SELECT diff.__ivm_count__, (diff.__ivm_count__ = mv.__ivm_count__) AS for_dlt, mv.ctid" + ", %s" + " FROM %s AS mv, %s AS diff WHERE (%s) = (%s)" + "), updt AS (" + " UPDATE %s AS mv SET __ivm_count__ = mv.__ivm_count__ - t.__ivm_count__" + ", %s " + " FROM t WHERE mv.ctid = t.ctid AND NOT for_dlt" + ") DELETE FROM %s AS mv USING t WHERE mv.ctid = t.ctid AND for_dlt;", I fully understand that this is POC code, but I am curious about one thing. These queries that are executed by apply_delta() would need to be converted to C, or at least used reusable plans, right? Hmm, creating and dropping temporary tables every time is a clue that the ultimate form of this should be tuplestores and C code, I think, right? > Moreover, some regression test are added for aggregate functions support. > This is Hoshiai-san's work. Great. Next time you post a WIP patch, could you please fix this small compiler warning? describe.c: In function ‘describeOneTableDetails’: describe.c:3270:55: error: ‘*((void *)&tableinfo+48)’ may be used uninitialized in this function [-Werror=maybe-uninitialized] if (verbose && tableinfo.relkind == RELKIND_MATVIEW && tableinfo.isivm) ^ describe.c:1495:4: note: ‘*((void *)&tableinfo+48)’ was declared here } tableinfo; ^ Then our unofficial automatic CI system[1] will run these tests every day, which sometimes finds problems. [1] cfbot.cputube.org -- Thomas Munro https://enterprisedb.com
> As for how to make internal columns invisible to SELECT *, previously > there have been discussions about doing that using a new flag in > pg_attribute: > > https://www.postgresql.org/message-id/flat/CAEepm%3D3ZHh%3Dp0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA%40mail.gmail.com Now that I realized that there are several use cases for invisible columns, I think this is the way what we shoud go for. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On Fri, Jun 28, 2019 at 10:56 PM Yugo Nagata <nagata@sraoss.co.jp> wrote:
> Attached is a WIP patch of IVM which supports some aggregate functions.
Hi Nagata-san and Hoshiai-san,
Thank you for working on this. I enjoyed your talk at PGCon. I've
added Kevin Grittner just in case he missed this thread; he has talked
often about implementing the counting algorithm, and he wrote the
"trigger transition tables" feature to support exactly this. While
integrating trigger transition tables with the new partition features,
we had to make a number of decisions about how that should work, and
we tried to come up with answers that would work for IMV, and I hope
we made the right choices!
I am quite interested to learn how IVM interacts with SERIALIZABLE.
A couple of superficial review comments:
Please find attached patches. The some of your review is reflected in patch too.
We manage and update IVM on following github repository.
https://github.com/sraoss/pgsql-ivm
you also can found latest WIP patch here.
+ const char *aggname = get_func_name(aggref->aggfnoid);
...
+ else if (!strcmp(aggname, "sum"))
I guess you need a more robust way to detect the supported aggregates
than their name, or I guess some way for aggregates themselves to
specify that they support this and somehow supply the extra logic.
Perhaps I just waid what Greg Stark already said, except not as well.
+ elog(ERROR, "Aggrege function %s is not
supported", aggname);
s/Aggrege/aggregate/
Of course it is not helpful to comment on typos at this early stage,
it's just that this one appears many times in the test output :-)
+static bool
+isIvmColumn(const char *s)
+{
+ char pre[7];
+
+ strlcpy(pre, s, sizeof(pre));
+ return (strcmp(pre, "__ivm_") == 0);
+}
What about strncmp(s, "__ivm_", 6) == 0)?
As for the question of how
to reserve a namespace for system columns that won't clash with user
columns, according to our manual the SQL standard doesn't allow $ in
identifier names, and according to my copy SQL92 "intermediate SQL"
doesn't allow identifiers that end in an underscore. I don't know
what the best answer is but we should probably decide on a something
based the standard.
As for how to make internal columns invisible to SELECT *, previously
there have been discussions about doing that using a new flag in
pg_attribute:
https://www.postgresql.org/message-id/flat/CAEepm%3D3ZHh%3Dp0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA%40mail.gmail.com
+ "WITH t AS ("
+ " SELECT diff.__ivm_count__,
(diff.__ivm_count__ = mv.__ivm_count__) AS for_dlt, mv.ctid"
+ ", %s"
+ " FROM %s AS mv, %s AS diff WHERE (%s) = (%s)"
+ "), updt AS ("
+ " UPDATE %s AS mv SET __ivm_count__ =
mv.__ivm_count__ - t.__ivm_count__"
+ ", %s "
+ " FROM t WHERE mv.ctid = t.ctid AND NOT for_dlt"
+ ") DELETE FROM %s AS mv USING t WHERE
mv.ctid = t.ctid AND for_dlt;",
I fully understand that this is POC code, but I am curious about one
thing. These queries that are executed by apply_delta() would need to
be converted to C, or at least used reusable plans, right? Hmm,
creating and dropping temporary tables every time is a clue that the
ultimate form of this should be tuplestores and C code, I think,
right?
> Moreover, some regression test are added for aggregate functions support.
> This is Hoshiai-san's work.
Great. Next time you post a WIP patch, could you please fix this
small compiler warning?
describe.c: In function ‘describeOneTableDetails’:
describe.c:3270:55: error: ‘*((void *)&tableinfo+48)’ may be used
uninitialized in this function [-Werror=maybe-uninitialized]
if (verbose && tableinfo.relkind == RELKIND_MATVIEW && tableinfo.isivm)
^
describe.c:1495:4: note: ‘*((void *)&tableinfo+48)’ was declared here
} tableinfo;
^
Then our unofficial automatic CI system[1] will run these tests every
day, which sometimes finds problems.
[1] cfbot.cputube.org
--
Thomas Munro
https://enterprisedb.com
On Wed, 10 Jul 2019 11:07:15 +0900 Takuma Hoshiai <takuma.hoshiai@gmail.com> wrote: > Hi Thomas, > > 2019年7月8日(月) 15:32 Thomas Munro <thomas.munro@gmail.com>: > > > On Fri, Jun 28, 2019 at 10:56 PM Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > Attached is a WIP patch of IVM which supports some aggregate functions. > > > > Hi Nagata-san and Hoshiai-san, > > > > Thank you for working on this. I enjoyed your talk at PGCon. I've > > added Kevin Grittner just in case he missed this thread; he has talked > > often about implementing the counting algorithm, and he wrote the > > "trigger transition tables" feature to support exactly this. While > > integrating trigger transition tables with the new partition features, > > we had to make a number of decisions about how that should work, and > > we tried to come up with answers that would work for IMV, and I hope > > we made the right choices! > > > > I am quite interested to learn how IVM interacts with SERIALIZABLE. > > > > Nagata-san has been studying this. Nagata-san, any comment? > > > A couple of superficial review comments: > > Thank you for your review comments. > Please find attached patches. The some of your review is reflected in patch > too. Sorry, I forgot attaching patch. In addition, avg() function is supported newly. We found a issue when use avg() with IVM, added its reproduction case in regressio test. We are being to fix now. > We manage and update IVM on following github repository. > https://github.com/sraoss/pgsql-ivm > you also can found latest WIP patch here. > > > > + const char *aggname = get_func_name(aggref->aggfnoid); > > ... > > + else if (!strcmp(aggname, "sum")) > > > > I guess you need a more robust way to detect the supported aggregates > > than their name, or I guess some way for aggregates themselves to > > specify that they support this and somehow supply the extra logic. > > Perhaps I just waid what Greg Stark already said, except not as well. > > > > We have recognized the issue and are welcome any input. > > > + elog(ERROR, "Aggrege function %s is not > > supported", aggname); > > > > s/Aggrege/aggregate/ > > > > I fixed this typo. > > > Of course it is not helpful to comment on typos at this early stage, > > it's just that this one appears many times in the test output :-) > > > > +static bool > > +isIvmColumn(const char *s) > > +{ > > + char pre[7]; > > + > > + strlcpy(pre, s, sizeof(pre)); > > + return (strcmp(pre, "__ivm_") == 0); > > +} > > > > What about strncmp(s, "__ivm_", 6) == 0)? > > > I agree with you, I fixed it. > > > As for the question of how > > to reserve a namespace for system columns that won't clash with user > > columns, according to our manual the SQL standard doesn't allow $ in > > identifier names, and according to my copy SQL92 "intermediate SQL" > > doesn't allow identifiers that end in an underscore. I don't know > > what the best answer is but we should probably decide on a something > > based the standard. > > > > As for how to make internal columns invisible to SELECT *, previously > > there have been discussions about doing that using a new flag in > > pg_attribute: > > > > > > https://www.postgresql.org/message-id/flat/CAEepm%3D3ZHh%3Dp0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA%40mail.gmail.com > > > > + "WITH t AS (" > > + " SELECT diff.__ivm_count__, > > (diff.__ivm_count__ = mv.__ivm_count__) AS for_dlt, mv.ctid" > > + ", %s" > > + " FROM %s AS mv, %s AS diff WHERE (%s) = > > (%s)" > > + "), updt AS (" > > + " UPDATE %s AS mv SET __ivm_count__ = > > mv.__ivm_count__ - t.__ivm_count__" > > + ", %s " > > + " FROM t WHERE mv.ctid = t.ctid AND NOT > > for_dlt" > > + ") DELETE FROM %s AS mv USING t WHERE > > mv.ctid = t.ctid AND for_dlt;", > > > > I fully understand that this is POC code, but I am curious about one > > thing. These queries that are executed by apply_delta() would need to > > be converted to C, or at least used reusable plans, right? Hmm, > > creating and dropping temporary tables every time is a clue that the > > ultimate form of this should be tuplestores and C code, I think, > > right? > > > > Nagata-san is investing the issue. > > > > > Moreover, some regression test are added for aggregate functions support. > > > This is Hoshiai-san's work. > > > > Great. Next time you post a WIP patch, could you please fix this > > small compiler warning? > > > > describe.c: In function ‘describeOneTableDetails’: > > describe.c:3270:55: error: ‘*((void *)&tableinfo+48)’ may be used > > uninitialized in this function [-Werror=maybe-uninitialized] > > if (verbose && tableinfo.relkind == RELKIND_MATVIEW && tableinfo.isivm) > > ^ > > describe.c:1495:4: note: ‘*((void *)&tableinfo+48)’ was declared here > > } tableinfo; > > ^ > > > > It is fixed too. > > > Then our unofficial automatic CI system[1] will run these tests every > > day, which sometimes finds problems. > > > > [1] cfbot.cputube.org > > > > -- > > Thomas Munro > > https://enterprisedb.com > > > > > Best regards, > > Takuma Hoshiai -- Takuma Hoshiai <hoshiai@sraoss.co.jp>
Attachment
> I am quite interested to learn how IVM interacts with SERIALIZABLE. Just for a fun, I have added: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; right after every BEGIN; in incremental_matview.sql in regression test and it seems it works. > A couple of superficial review comments: > > + const char *aggname = get_func_name(aggref->aggfnoid); > ... > + else if (!strcmp(aggname, "sum")) > > I guess you need a more robust way to detect the supported aggregates > than their name, or I guess some way for aggregates themselves to > specify that they support this and somehow supply the extra logic. > Perhaps I just waid what Greg Stark already said, except not as well. I guess we could use moving aggregate (or partial aggregate?) functions for this purpose, but then we need to run executor directly rather using SPI. It needs more codes... > + "WITH t AS (" > + " SELECT diff.__ivm_count__, > (diff.__ivm_count__ = mv.__ivm_count__) AS for_dlt, mv.ctid" > + ", %s" > + " FROM %s AS mv, %s AS diff WHERE (%s) = (%s)" > + "), updt AS (" > + " UPDATE %s AS mv SET __ivm_count__ = > mv.__ivm_count__ - t.__ivm_count__" > + ", %s " > + " FROM t WHERE mv.ctid = t.ctid AND NOT for_dlt" > + ") DELETE FROM %s AS mv USING t WHERE > mv.ctid = t.ctid AND for_dlt;", > > I fully understand that this is POC code, but I am curious about one > thing. These queries that are executed by apply_delta() would need to > be converted to C, or at least used reusable plans, right? Hmm, > creating and dropping temporary tables every time is a clue that the > ultimate form of this should be tuplestores and C code, I think, > right? Yes, we could reuse the temp tables and plans. > Then our unofficial automatic CI system[1] will run these tests every > day, which sometimes finds problems. > > [1] cfbot.cputube.org I appreciate that you provide the system. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Hi Thomas, Thank you for your review and discussion on this patch! > > 2019年7月8日(月) 15:32 Thomas Munro <thomas.munro@gmail.com>: > > > > > On Fri, Jun 28, 2019 at 10:56 PM Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > > Attached is a WIP patch of IVM which supports some aggregate functions. > > > > > > Hi Nagata-san and Hoshiai-san, > > > > > > Thank you for working on this. I enjoyed your talk at PGCon. I've > > > added Kevin Grittner just in case he missed this thread; he has talked > > > often about implementing the counting algorithm, and he wrote the > > > "trigger transition tables" feature to support exactly this. While > > > integrating trigger transition tables with the new partition features, > > > we had to make a number of decisions about how that should work, and > > > we tried to come up with answers that would work for IMV, and I hope > > > we made the right choices! Transition tables is a great feature. I am now using this in my implementation of IVM, but the first time I used this feature was when I implemented a PoC for extending view updatability of PostgreSQL[1]. At that time, I didn't know that this feature is made originally aiming to support IVM. [1] https://www.pgcon.org/2017/schedule/events/1074.en.html I think transition tables is a good choice to implement a statement level immediate view maintenance where materialized views are refreshed in a statement level after trigger. However, when implementing a transaction level immediate view maintenance where views are refreshed per transaction, or deferred view maintenance, we can't update views in a after trigger, and we will need an infrastructure to manage change logs of base tables. Transition tables can be used to collect these logs, but using logical decoding of WAL is another candidate. In any way, if these logs can be collected in a tuplestore, we might able to convert this to "ephemeral named relation (ENR)" and use this to calculate diff sets for views. > > > > > > I am quite interested to learn how IVM interacts with SERIALIZABLE. > > > > > > > Nagata-san has been studying this. Nagata-san, any comment? In SERIALIZABLE or REPEATABLE READ level, table changes occurred in other ransactions are not visible, so views can not be maintained correctly in AFTER triggers. If a view is defined on two tables and each table is modified in different concurrent transactions respectively, the result of view maintenance done in trigger functions can be incorrect due to the race condition. This is the reason why such transactions are aborted immediately in that case in my current implementation. One idea to resolve this is performing view maintenance in two phases. Firstly, views are updated using only table changes visible in this transaction. Then, just after this transaction is committed, views have to be updated additionally using changes happened in other transactions to keep consistency. This is a just idea, but to implement this idea, I think, we will need keep to keep and maintain change logs. > > > A couple of superficial review comments: > > > + const char *aggname = get_func_name(aggref->aggfnoid); > > > ... > > > + else if (!strcmp(aggname, "sum")) > > > > > > I guess you need a more robust way to detect the supported aggregates > > > than their name, or I guess some way for aggregates themselves to > > > specify that they support this and somehow supply the extra logic. > > > Perhaps I just waid what Greg Stark already said, except not as well. Yes. Using name is not robust because users can make same name aggregates like sum(text) (although I am not sure this makes sense). We can use oids instead of names, but it would be nice to extend pg_aggregate and add new attributes for informing that this supports IVM and for providing functions for IVM logic. > > > As for the question of how > > > to reserve a namespace for system columns that won't clash with user > > > columns, according to our manual the SQL standard doesn't allow $ in > > > identifier names, and according to my copy SQL92 "intermediate SQL" > > > doesn't allow identifiers that end in an underscore. I don't know > > > what the best answer is but we should probably decide on a something > > > based the standard. Ok, so we should use "__ivm_count__" since this ends in "_" at least. Another idea is that users specify the name of this special column when defining materialized views with IVM support. This way can avoid the conflict because users will specify a name which does not appear in the target list. As for aggregates supports, it may be also possible to make it a restriction that count(expr) must be in the target list explicitly when sum(expr) or avg(expr) is included, instead of making hidden column like __ivm_count_sum__, like Oracle does. > > > > > > As for how to make internal columns invisible to SELECT *, previously > > > there have been discussions about doing that using a new flag in > > > pg_attribute: > > > > > > > > > https://www.postgresql.org/message-id/flat/CAEepm%3D3ZHh%3Dp0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA%40mail.gmail.com I agree implementing this feature in PostgreSQL since there are at least a few use cases, IVM and temporal database. > > > > > > + "WITH t AS (" > > > + " SELECT diff.__ivm_count__, > > > (diff.__ivm_count__ = mv.__ivm_count__) AS for_dlt, mv.ctid" > > > + ", %s" > > > + " FROM %s AS mv, %s AS diff WHERE (%s) = > > > (%s)" > > > + "), updt AS (" > > > + " UPDATE %s AS mv SET __ivm_count__ = > > > mv.__ivm_count__ - t.__ivm_count__" > > > + ", %s " > > > + " FROM t WHERE mv.ctid = t.ctid AND NOT > > > for_dlt" > > > + ") DELETE FROM %s AS mv USING t WHERE > > > mv.ctid = t.ctid AND for_dlt;", > > > > > > I fully understand that this is POC code, but I am curious about one > > > thing. These queries that are executed by apply_delta() would need to > > > be converted to C, or at least used reusable plans, right? Hmm, > > > creating and dropping temporary tables every time is a clue that the > > > ultimate form of this should be tuplestores and C code, I think, > > > right? I used SPI just because REFRESH CONCURRENTLY uses this, but, as you said, it is inefficient to create/drop temp tables and perform parse/plan every times. It seems to be enough to perform this once when creating materialized views or at the first maintenance time. Best regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
Hi, I've updated the wiki page of Incremental View Maintenance. https://wiki.postgresql.org/wiki/Incremental_View_Maintenance On Thu, 11 Jul 2019 13:28:04 +0900 Yugo Nagata <nagata@sraoss.co.jp> wrote: > Hi Thomas, > > Thank you for your review and discussion on this patch! > > > > 2019年7月8日(月) 15:32 Thomas Munro <thomas.munro@gmail.com>: > > > > > > > On Fri, Jun 28, 2019 at 10:56 PM Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > > > Attached is a WIP patch of IVM which supports some aggregate functions. > > > > > > > > Hi Nagata-san and Hoshiai-san, > > > > > > > > Thank you for working on this. I enjoyed your talk at PGCon. I've > > > > added Kevin Grittner just in case he missed this thread; he has talked > > > > often about implementing the counting algorithm, and he wrote the > > > > "trigger transition tables" feature to support exactly this. While > > > > integrating trigger transition tables with the new partition features, > > > > we had to make a number of decisions about how that should work, and > > > > we tried to come up with answers that would work for IMV, and I hope > > > > we made the right choices! > > Transition tables is a great feature. I am now using this in my implementation > of IVM, but the first time I used this feature was when I implemented a PoC > for extending view updatability of PostgreSQL[1]. At that time, I didn't know > that this feature is made originally aiming to support IVM. > > [1] https://www.pgcon.org/2017/schedule/events/1074.en.html > > I think transition tables is a good choice to implement a statement level > immediate view maintenance where materialized views are refreshed in a statement > level after trigger. However, when implementing a transaction level immediate > view maintenance where views are refreshed per transaction, or deferred view > maintenance, we can't update views in a after trigger, and we will need an > infrastructure to manage change logs of base tables. Transition tables can be > used to collect these logs, but using logical decoding of WAL is another candidate. > In any way, if these logs can be collected in a tuplestore, we might able to > convert this to "ephemeral named relation (ENR)" and use this to calculate diff > sets for views. > > > > > > > > > I am quite interested to learn how IVM interacts with SERIALIZABLE. > > > > > > > > > > Nagata-san has been studying this. Nagata-san, any comment? > > In SERIALIZABLE or REPEATABLE READ level, table changes occurred in other > ransactions are not visible, so views can not be maintained correctly in AFTER > triggers. If a view is defined on two tables and each table is modified in > different concurrent transactions respectively, the result of view maintenance > done in trigger functions can be incorrect due to the race condition. This is the > reason why such transactions are aborted immediately in that case in my current > implementation. > > One idea to resolve this is performing view maintenance in two phases. Firstly, > views are updated using only table changes visible in this transaction. Then, > just after this transaction is committed, views have to be updated additionally > using changes happened in other transactions to keep consistency. This is a just > idea, but to implement this idea, I think, we will need keep to keep and > maintain change logs. > > > > > A couple of superficial review comments: > > > > > > > + const char *aggname = get_func_name(aggref->aggfnoid); > > > > ... > > > > + else if (!strcmp(aggname, "sum")) > > > > > > > > I guess you need a more robust way to detect the supported aggregates > > > > than their name, or I guess some way for aggregates themselves to > > > > specify that they support this and somehow supply the extra logic. > > > > Perhaps I just waid what Greg Stark already said, except not as well. > > Yes. Using name is not robust because users can make same name aggregates like > sum(text) (although I am not sure this makes sense). We can use oids instead > of names, but it would be nice to extend pg_aggregate and add new attributes > for informing that this supports IVM and for providing functions for IVM logic. > > > > > As for the question of how > > > > to reserve a namespace for system columns that won't clash with user > > > > columns, according to our manual the SQL standard doesn't allow $ in > > > > identifier names, and according to my copy SQL92 "intermediate SQL" > > > > doesn't allow identifiers that end in an underscore. I don't know > > > > what the best answer is but we should probably decide on a something > > > > based the standard. > > Ok, so we should use "__ivm_count__" since this ends in "_" at least. > > Another idea is that users specify the name of this special column when > defining materialized views with IVM support. This way can avoid the conflict > because users will specify a name which does not appear in the target list. > > As for aggregates supports, it may be also possible to make it a restriction > that count(expr) must be in the target list explicitly when sum(expr) or > avg(expr) is included, instead of making hidden column like __ivm_count_sum__, > like Oracle does. > > > > > > > > > As for how to make internal columns invisible to SELECT *, previously > > > > there have been discussions about doing that using a new flag in > > > > pg_attribute: > > > > > > > > > > > > https://www.postgresql.org/message-id/flat/CAEepm%3D3ZHh%3Dp0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA%40mail.gmail.com > > I agree implementing this feature in PostgreSQL since there are at least a few > use cases, IVM and temporal database. > > > > > > > > > + "WITH t AS (" > > > > + " SELECT diff.__ivm_count__, > > > > (diff.__ivm_count__ = mv.__ivm_count__) AS for_dlt, mv.ctid" > > > > + ", %s" > > > > + " FROM %s AS mv, %s AS diff WHERE (%s) = > > > > (%s)" > > > > + "), updt AS (" > > > > + " UPDATE %s AS mv SET __ivm_count__ = > > > > mv.__ivm_count__ - t.__ivm_count__" > > > > + ", %s " > > > > + " FROM t WHERE mv.ctid = t.ctid AND NOT > > > > for_dlt" > > > > + ") DELETE FROM %s AS mv USING t WHERE > > > > mv.ctid = t.ctid AND for_dlt;", > > > > > > > > I fully understand that this is POC code, but I am curious about one > > > > thing. These queries that are executed by apply_delta() would need to > > > > be converted to C, or at least used reusable plans, right? Hmm, > > > > creating and dropping temporary tables every time is a clue that the > > > > ultimate form of this should be tuplestores and C code, I think, > > > > right? > > I used SPI just because REFRESH CONCURRENTLY uses this, but, as you said, > it is inefficient to create/drop temp tables and perform parse/plan every times. > It seems to be enough to perform this once when creating materialized views or > at the first maintenance time. > > > Best regards, > Yugo Nagata > > > -- > Yugo Nagata <nagata@sraoss.co.jp> > > -- Yugo Nagata <nagata@sraoss.co.jp>
Hi, Attached is the latest patch for supporting min and max aggregate functions. When new tuples are inserted into base tables, if new values are small (for min) or large (for max), matview just have to be updated with these new values. Otherwise, old values just remains. However, in the case of deletion, this is more complicated. If deleted values exists in matview as current min or max, we have to recomputate new min or max values from base tables for affected groups, and matview should be updated with these recomputated values. Also, regression tests for min/max are also added. In addition, incremental update algorithm of avg aggregate values is a bit improved. If an avg result in materialized views is updated incrementally y using the old avg value, numerical errors in avg values are accumulated and the values get wrong eventually. To prevent this, both of sum and count values are contained in views as hidden columns and use them to calculate new avg value instead of using old avg values. Regards, On Fri, 26 Jul 2019 11:35:53 +0900 Yugo Nagata <nagata@sraoss.co.jp> wrote: > Hi, > > I've updated the wiki page of Incremental View Maintenance. > > https://wiki.postgresql.org/wiki/Incremental_View_Maintenance > > On Thu, 11 Jul 2019 13:28:04 +0900 > Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > Hi Thomas, > > > > Thank you for your review and discussion on this patch! > > > > > > 2019年7月8日(月) 15:32 Thomas Munro <thomas.munro@gmail.com>: > > > > > > > > > On Fri, Jun 28, 2019 at 10:56 PM Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > > > > Attached is a WIP patch of IVM which supports some aggregate functions. > > > > > > > > > > Hi Nagata-san and Hoshiai-san, > > > > > > > > > > Thank you for working on this. I enjoyed your talk at PGCon. I've > > > > > added Kevin Grittner just in case he missed this thread; he has talked > > > > > often about implementing the counting algorithm, and he wrote the > > > > > "trigger transition tables" feature to support exactly this. While > > > > > integrating trigger transition tables with the new partition features, > > > > > we had to make a number of decisions about how that should work, and > > > > > we tried to come up with answers that would work for IMV, and I hope > > > > > we made the right choices! > > > > Transition tables is a great feature. I am now using this in my implementation > > of IVM, but the first time I used this feature was when I implemented a PoC > > for extending view updatability of PostgreSQL[1]. At that time, I didn't know > > that this feature is made originally aiming to support IVM. > > > > [1] https://www.pgcon.org/2017/schedule/events/1074.en.html > > > > I think transition tables is a good choice to implement a statement level > > immediate view maintenance where materialized views are refreshed in a statement > > level after trigger. However, when implementing a transaction level immediate > > view maintenance where views are refreshed per transaction, or deferred view > > maintenance, we can't update views in a after trigger, and we will need an > > infrastructure to manage change logs of base tables. Transition tables can be > > used to collect these logs, but using logical decoding of WAL is another candidate. > > In any way, if these logs can be collected in a tuplestore, we might able to > > convert this to "ephemeral named relation (ENR)" and use this to calculate diff > > sets for views. > > > > > > > > > > > > I am quite interested to learn how IVM interacts with SERIALIZABLE. > > > > > > > > > > > > > Nagata-san has been studying this. Nagata-san, any comment? > > > > In SERIALIZABLE or REPEATABLE READ level, table changes occurred in other > > ransactions are not visible, so views can not be maintained correctly in AFTER > > triggers. If a view is defined on two tables and each table is modified in > > different concurrent transactions respectively, the result of view maintenance > > done in trigger functions can be incorrect due to the race condition. This is the > > reason why such transactions are aborted immediately in that case in my current > > implementation. > > > > One idea to resolve this is performing view maintenance in two phases. Firstly, > > views are updated using only table changes visible in this transaction. Then, > > just after this transaction is committed, views have to be updated additionally > > using changes happened in other transactions to keep consistency. This is a just > > idea, but to implement this idea, I think, we will need keep to keep and > > maintain change logs. > > > > > > > A couple of superficial review comments: > > > > > > > > > > > + const char *aggname = get_func_name(aggref->aggfnoid); > > > > > ... > > > > > + else if (!strcmp(aggname, "sum")) > > > > > > > > > > I guess you need a more robust way to detect the supported aggregates > > > > > than their name, or I guess some way for aggregates themselves to > > > > > specify that they support this and somehow supply the extra logic. > > > > > Perhaps I just waid what Greg Stark already said, except not as well. > > > > Yes. Using name is not robust because users can make same name aggregates like > > sum(text) (although I am not sure this makes sense). We can use oids instead > > of names, but it would be nice to extend pg_aggregate and add new attributes > > for informing that this supports IVM and for providing functions for IVM logic. > > > > > > > As for the question of how > > > > > to reserve a namespace for system columns that won't clash with user > > > > > columns, according to our manual the SQL standard doesn't allow $ in > > > > > identifier names, and according to my copy SQL92 "intermediate SQL" > > > > > doesn't allow identifiers that end in an underscore. I don't know > > > > > what the best answer is but we should probably decide on a something > > > > > based the standard. > > > > Ok, so we should use "__ivm_count__" since this ends in "_" at least. > > > > Another idea is that users specify the name of this special column when > > defining materialized views with IVM support. This way can avoid the conflict > > because users will specify a name which does not appear in the target list. > > > > As for aggregates supports, it may be also possible to make it a restriction > > that count(expr) must be in the target list explicitly when sum(expr) or > > avg(expr) is included, instead of making hidden column like __ivm_count_sum__, > > like Oracle does. > > > > > > > > > > > > As for how to make internal columns invisible to SELECT *, previously > > > > > there have been discussions about doing that using a new flag in > > > > > pg_attribute: > > > > > > > > > > > > > > > https://www.postgresql.org/message-id/flat/CAEepm%3D3ZHh%3Dp0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA%40mail.gmail.com > > > > I agree implementing this feature in PostgreSQL since there are at least a few > > use cases, IVM and temporal database. > > > > > > > > > > > > + "WITH t AS (" > > > > > + " SELECT diff.__ivm_count__, > > > > > (diff.__ivm_count__ = mv.__ivm_count__) AS for_dlt, mv.ctid" > > > > > + ", %s" > > > > > + " FROM %s AS mv, %s AS diff WHERE (%s) = > > > > > (%s)" > > > > > + "), updt AS (" > > > > > + " UPDATE %s AS mv SET __ivm_count__ = > > > > > mv.__ivm_count__ - t.__ivm_count__" > > > > > + ", %s " > > > > > + " FROM t WHERE mv.ctid = t.ctid AND NOT > > > > > for_dlt" > > > > > + ") DELETE FROM %s AS mv USING t WHERE > > > > > mv.ctid = t.ctid AND for_dlt;", > > > > > > > > > > I fully understand that this is POC code, but I am curious about one > > > > > thing. These queries that are executed by apply_delta() would need to > > > > > be converted to C, or at least used reusable plans, right? Hmm, > > > > > creating and dropping temporary tables every time is a clue that the > > > > > ultimate form of this should be tuplestores and C code, I think, > > > > > right? > > > > I used SPI just because REFRESH CONCURRENTLY uses this, but, as you said, > > it is inefficient to create/drop temp tables and perform parse/plan every times. > > It seems to be enough to perform this once when creating materialized views or > > at the first maintenance time. > > > > > > Best regards, > > Yugo Nagata > > > > > > -- > > Yugo Nagata <nagata@sraoss.co.jp> > > > > > > > -- > Yugo Nagata <nagata@sraoss.co.jp> > > -- Yugo Nagata <nagata@sraoss.co.jp>
Attachment
It's not mentioned below but some bugs including seg fault when --enable-casser is enabled was also fixed in this patch. BTW, I found a bug with min/max support in this patch and I believe Yugo is working on it. Details: https://github.com/sraoss/pgsql-ivm/issues/20 Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp From: Yugo Nagata <nagata@sraoss.co.jp> Subject: Re: Implementing Incremental View Maintenance Date: Wed, 31 Jul 2019 18:08:51 +0900 Message-ID: <20190731180851.73856441d8abb494bf5e68e7@sraoss.co.jp> > Hi, > > Attached is the latest patch for supporting min and max aggregate functions. > > When new tuples are inserted into base tables, if new values are small > (for min) or large (for max), matview just have to be updated with these > new values. Otherwise, old values just remains. > > However, in the case of deletion, this is more complicated. If deleted > values exists in matview as current min or max, we have to recomputate > new min or max values from base tables for affected groups, and matview > should be updated with these recomputated values. > > Also, regression tests for min/max are also added. > > In addition, incremental update algorithm of avg aggregate values is a bit > improved. If an avg result in materialized views is updated incrementally > y using the old avg value, numerical errors in avg values are accumulated > and the values get wrong eventually. To prevent this, both of sum and count > values are contained in views as hidden columns and use them to calculate > new avg value instead of using old avg values. > > Regards, > > On Fri, 26 Jul 2019 11:35:53 +0900 > Yugo Nagata <nagata@sraoss.co.jp> wrote: > >> Hi, >> >> I've updated the wiki page of Incremental View Maintenance. >> >> https://wiki.postgresql.org/wiki/Incremental_View_Maintenance >> >> On Thu, 11 Jul 2019 13:28:04 +0900 >> Yugo Nagata <nagata@sraoss.co.jp> wrote: >> >> > Hi Thomas, >> > >> > Thank you for your review and discussion on this patch! >> > >> > > > 2019年7月8日(月) 15:32 Thomas Munro <thomas.munro@gmail.com>: >> > > > >> > > > > On Fri, Jun 28, 2019 at 10:56 PM Yugo Nagata <nagata@sraoss.co.jp> wrote: >> > > > > > Attached is a WIP patch of IVM which supports some aggregate functions. >> > > > > >> > > > > Hi Nagata-san and Hoshiai-san, >> > > > > >> > > > > Thank you for working on this. I enjoyed your talk at PGCon. I've >> > > > > added Kevin Grittner just in case he missed this thread; he has talked >> > > > > often about implementing the counting algorithm, and he wrote the >> > > > > "trigger transition tables" feature to support exactly this. While >> > > > > integrating trigger transition tables with the new partition features, >> > > > > we had to make a number of decisions about how that should work, and >> > > > > we tried to come up with answers that would work for IMV, and I hope >> > > > > we made the right choices! >> > >> > Transition tables is a great feature. I am now using this in my implementation >> > of IVM, but the first time I used this feature was when I implemented a PoC >> > for extending view updatability of PostgreSQL[1]. At that time, I didn't know >> > that this feature is made originally aiming to support IVM. >> > >> > [1] https://www.pgcon.org/2017/schedule/events/1074.en.html >> > >> > I think transition tables is a good choice to implement a statement level >> > immediate view maintenance where materialized views are refreshed in a statement >> > level after trigger. However, when implementing a transaction level immediate >> > view maintenance where views are refreshed per transaction, or deferred view >> > maintenance, we can't update views in a after trigger, and we will need an >> > infrastructure to manage change logs of base tables. Transition tables can be >> > used to collect these logs, but using logical decoding of WAL is another candidate. >> > In any way, if these logs can be collected in a tuplestore, we might able to >> > convert this to "ephemeral named relation (ENR)" and use this to calculate diff >> > sets for views. >> > >> > > > > >> > > > > I am quite interested to learn how IVM interacts with SERIALIZABLE. >> > > > > >> > > > >> > > > Nagata-san has been studying this. Nagata-san, any comment? >> > >> > In SERIALIZABLE or REPEATABLE READ level, table changes occurred in other >> > ransactions are not visible, so views can not be maintained correctly in AFTER >> > triggers. If a view is defined on two tables and each table is modified in >> > different concurrent transactions respectively, the result of view maintenance >> > done in trigger functions can be incorrect due to the race condition. This is the >> > reason why such transactions are aborted immediately in that case in my current >> > implementation. >> > >> > One idea to resolve this is performing view maintenance in two phases. Firstly, >> > views are updated using only table changes visible in this transaction. Then, >> > just after this transaction is committed, views have to be updated additionally >> > using changes happened in other transactions to keep consistency. This is a just >> > idea, but to implement this idea, I think, we will need keep to keep and >> > maintain change logs. >> > >> > > > > A couple of superficial review comments: >> > >> > >> > >> > > > > + const char *aggname = get_func_name(aggref->aggfnoid); >> > > > > ... >> > > > > + else if (!strcmp(aggname, "sum")) >> > > > > >> > > > > I guess you need a more robust way to detect the supported aggregates >> > > > > than their name, or I guess some way for aggregates themselves to >> > > > > specify that they support this and somehow supply the extra logic. >> > > > > Perhaps I just waid what Greg Stark already said, except not as well. >> > >> > Yes. Using name is not robust because users can make same name aggregates like >> > sum(text) (although I am not sure this makes sense). We can use oids instead >> > of names, but it would be nice to extend pg_aggregate and add new attributes >> > for informing that this supports IVM and for providing functions for IVM logic. >> > >> > > > > As for the question of how >> > > > > to reserve a namespace for system columns that won't clash with user >> > > > > columns, according to our manual the SQL standard doesn't allow $ in >> > > > > identifier names, and according to my copy SQL92 "intermediate SQL" >> > > > > doesn't allow identifiers that end in an underscore. I don't know >> > > > > what the best answer is but we should probably decide on a something >> > > > > based the standard. >> > >> > Ok, so we should use "__ivm_count__" since this ends in "_" at least. >> > >> > Another idea is that users specify the name of this special column when >> > defining materialized views with IVM support. This way can avoid the conflict >> > because users will specify a name which does not appear in the target list. >> > >> > As for aggregates supports, it may be also possible to make it a restriction >> > that count(expr) must be in the target list explicitly when sum(expr) or >> > avg(expr) is included, instead of making hidden column like __ivm_count_sum__, >> > like Oracle does. >> > >> > > > > >> > > > > As for how to make internal columns invisible to SELECT *, previously >> > > > > there have been discussions about doing that using a new flag in >> > > > > pg_attribute: >> > > > > >> > > > > >> > > > > https://www.postgresql.org/message-id/flat/CAEepm%3D3ZHh%3Dp0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA%40mail.gmail.com >> > >> > I agree implementing this feature in PostgreSQL since there are at least a few >> > use cases, IVM and temporal database. >> > >> > > > > >> > > > > + "WITH t AS (" >> > > > > + " SELECT diff.__ivm_count__, >> > > > > (diff.__ivm_count__ = mv.__ivm_count__) AS for_dlt, mv.ctid" >> > > > > + ", %s" >> > > > > + " FROM %s AS mv, %s AS diff WHERE (%s) = >> > > > > (%s)" >> > > > > + "), updt AS (" >> > > > > + " UPDATE %s AS mv SET __ivm_count__ = >> > > > > mv.__ivm_count__ - t.__ivm_count__" >> > > > > + ", %s " >> > > > > + " FROM t WHERE mv.ctid = t.ctid AND NOT >> > > > > for_dlt" >> > > > > + ") DELETE FROM %s AS mv USING t WHERE >> > > > > mv.ctid = t.ctid AND for_dlt;", >> > > > > >> > > > > I fully understand that this is POC code, but I am curious about one >> > > > > thing. These queries that are executed by apply_delta() would need to >> > > > > be converted to C, or at least used reusable plans, right? Hmm, >> > > > > creating and dropping temporary tables every time is a clue that the >> > > > > ultimate form of this should be tuplestores and C code, I think, >> > > > > right? >> > >> > I used SPI just because REFRESH CONCURRENTLY uses this, but, as you said, >> > it is inefficient to create/drop temp tables and perform parse/plan every times. >> > It seems to be enough to perform this once when creating materialized views or >> > at the first maintenance time. >> > >> > >> > Best regards, >> > Yugo Nagata >> > >> > >> > -- >> > Yugo Nagata <nagata@sraoss.co.jp> >> > >> > >> >> >> -- >> Yugo Nagata <nagata@sraoss.co.jp> >> >> > > > -- > Yugo Nagata <nagata@sraoss.co.jp>
On 2019-Aug-06, Tatsuo Ishii wrote: > It's not mentioned below but some bugs including seg fault when > --enable-casser is enabled was also fixed in this patch. > > BTW, I found a bug with min/max support in this patch and I believe > Yugo is working on it. Details: > https://github.com/sraoss/pgsql-ivm/issues/20 So is he posting an updated patch soon? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> On 2019-Aug-06, Tatsuo Ishii wrote: > >> It's not mentioned below but some bugs including seg fault when >> --enable-casser is enabled was also fixed in this patch. >> >> BTW, I found a bug with min/max support in this patch and I believe >> Yugo is working on it. Details: >> https://github.com/sraoss/pgsql-ivm/issues/20 > > So is he posting an updated patch soon? I think he is going to post an updated patch by the end of this month. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Hi Paul, Thank you for your suggestion. On Sun, 15 Sep 2019 11:52:22 -0600 Paul Draper <paulddraper@gmail.com> wrote: > As I understand it, the current patch performs immediate IVM using AFTER > STATEMENT trigger transition tables. > > However, multiple tables can be modified *before* AFTER STATEMENT triggers > are fired. > > CREATE TABLE example1 (a int); > CREATE TABLE example2 (a int); > > CREATE INCREMENTAL MATERIALIZED VIEW mv AS > SELECT example1.a, example2.a > FROM example1 JOIN example2 ON a; > > WITH > insert1 AS (INSERT INTO example1 VALUES (1)), > insert2 AS (INSERT INTO example2 VALUES (1)) > SELECT NULL; > > Changes to example1 are visible in an AFTER STATEMENT trigger on example2, > and vice versa. Would this not result in the (1, 1) tuple being > "double-counted"? > > IVM needs to either: > > (1) Evaluate deltas "serially' (e.g. EACH ROW triggers) > > (2) Have simultaneous access to multiple deltas: > delta_mv = example1 x delta_example2 + example2 x delta_example1 - > delta_example1 x delta_example2 > > This latter method is the "logged" approach that has been discussed for > deferred evaluation. > > tl;dr It seems that AFTER STATEMENT triggers required a deferred-like > implementation anyway. You are right, the latest patch doesn't support the situation where multiple tables are modified in a query. I noticed this when working on self-join, which also virtually need to handle multiple table modification. I am now working on this issue and the next patch will enable to handle this situation. I plan to submit the patch during this month. Roughly speaking, in the new implementation, AFTER STATEMENT triggers are used to collect information of modified table and its changes (= transition tables), and then the only last trigger updates the view. This will avoid the double-counting. I think this implementation also would be a base of deferred approach implementation in future where "logs" are used instead of transition tables. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
Hi Paul,
Thank you for your suggestion.
On Sun, 15 Sep 2019 11:52:22 -0600
Paul Draper <paulddraper@gmail.com> wrote:
> As I understand it, the current patch performs immediate IVM using AFTER
> STATEMENT trigger transition tables.
>
> However, multiple tables can be modified *before* AFTER STATEMENT triggers
> are fired.
>
> CREATE TABLE example1 (a int);
> CREATE TABLE example2 (a int);
>
> CREATE INCREMENTAL MATERIALIZED VIEW mv AS
> SELECT example1.a, example2.a
> FROM example1 JOIN example2 ON a;
>
> WITH
> insert1 AS (INSERT INTO example1 VALUES (1)),
> insert2 AS (INSERT INTO example2 VALUES (1))
> SELECT NULL;
>
> Changes to example1 are visible in an AFTER STATEMENT trigger on example2,
> and vice versa. Would this not result in the (1, 1) tuple being
> "double-counted"?
>
> IVM needs to either:
>
> (1) Evaluate deltas "serially' (e.g. EACH ROW triggers)
>
> (2) Have simultaneous access to multiple deltas:
> delta_mv = example1 x delta_example2 + example2 x delta_example1 -
> delta_example1 x delta_example2
>
> This latter method is the "logged" approach that has been discussed for
> deferred evaluation.
>
> tl;dr It seems that AFTER STATEMENT triggers required a deferred-like
> implementation anyway.
You are right, the latest patch doesn't support the situation where
multiple tables are modified in a query. I noticed this when working
on self-join, which also virtually need to handle multiple table
modification.
I am now working on this issue and the next patch will enable to handle
this situation. I plan to submit the patch during this month. Roughly
speaking, in the new implementation, AFTER STATEMENT triggers are used to
collect information of modified table and its changes (= transition tables),
and then the only last trigger updates the view. This will avoid the
double-counting. I think this implementation also would be a base of
deferred approach implementation in future where "logs" are used instead
of transition tables.
Regards,
Yugo Nagata
--
Yugo Nagata <nagata@sraoss.co.jp>
> Have you had any thoughts for more than two joined tables? I am not sure what you are asking here but if you are asking if IVM supports two or more tables involved in a join, we already support it: DROP MATERIALIZED VIEW mv1; DROP MATERIALIZED VIEW DROP TABLE t1; DROP TABLE DROP TABLE t2; DROP TABLE DROP TABLE t3; DROP TABLE CREATE TABLE t1(i int, j int); CREATE TABLE CREATE TABLE t2(k int, l int); CREATE TABLE CREATE TABLE t3(m int, n int); CREATE TABLE INSERT INTO t1 VALUES(1,10),(2,11); INSERT 0 2 INSERT INTO t2 VALUES(1,20),(2,21); INSERT 0 2 INSERT INTO t3 VALUES(1,30),(2,31); INSERT 0 2 CREATE INCREMENTAL MATERIALIZED VIEW mv1 AS SELECT * FROM t1 INNER JOIN t2 ON t1.i = t2.k INNER JOIN t3 ON t1.i = t3.m; SELECT 2 SELECT * FROM mv1; i | j | k | l | m | n ---+----+---+----+---+---- 1 | 10 | 1 | 20 | 1 | 30 2 | 11 | 2 | 21 | 2 | 31 (2 rows) UPDATE t1 SET j = 15 WHERE i = 1; UPDATE 1 SELECT * FROM mv1; i | j | k | l | m | n ---+----+---+----+---+---- 2 | 11 | 2 | 21 | 2 | 31 1 | 15 | 1 | 20 | 1 | 30 (2 rows) Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp > Either there needs to be an quadratic number of joins, or intermediate join > results need to be stored and reused. > > On Tue, Sep 17, 2019 at 8:50 AM Yugo Nagata <nagata@sraoss.co.jp> wrote: > >> Hi Paul, >> >> Thank you for your suggestion. >> >> On Sun, 15 Sep 2019 11:52:22 -0600 >> Paul Draper <paulddraper@gmail.com> wrote: >> >> > As I understand it, the current patch performs immediate IVM using AFTER >> > STATEMENT trigger transition tables. >> > >> > However, multiple tables can be modified *before* AFTER STATEMENT >> triggers >> > are fired. >> > >> > CREATE TABLE example1 (a int); >> > CREATE TABLE example2 (a int); >> > >> > CREATE INCREMENTAL MATERIALIZED VIEW mv AS >> > SELECT example1.a, example2.a >> > FROM example1 JOIN example2 ON a; >> > >> > WITH >> > insert1 AS (INSERT INTO example1 VALUES (1)), >> > insert2 AS (INSERT INTO example2 VALUES (1)) >> > SELECT NULL; >> > >> > Changes to example1 are visible in an AFTER STATEMENT trigger on >> example2, >> > and vice versa. Would this not result in the (1, 1) tuple being >> > "double-counted"? >> > >> > IVM needs to either: >> > >> > (1) Evaluate deltas "serially' (e.g. EACH ROW triggers) >> > >> > (2) Have simultaneous access to multiple deltas: >> > delta_mv = example1 x delta_example2 + example2 x delta_example1 - >> > delta_example1 x delta_example2 >> > >> > This latter method is the "logged" approach that has been discussed for >> > deferred evaluation. >> > >> > tl;dr It seems that AFTER STATEMENT triggers required a deferred-like >> > implementation anyway. >> >> You are right, the latest patch doesn't support the situation where >> multiple tables are modified in a query. I noticed this when working >> on self-join, which also virtually need to handle multiple table >> modification. >> >> I am now working on this issue and the next patch will enable to handle >> this situation. I plan to submit the patch during this month. Roughly >> speaking, in the new implementation, AFTER STATEMENT triggers are used to >> collect information of modified table and its changes (= transition >> tables), >> and then the only last trigger updates the view. This will avoid the >> double-counting. I think this implementation also would be a base of >> deferred approach implementation in future where "logs" are used instead >> of transition tables. >> >> Regards, >> Yugo Nagata >> >> -- >> Yugo Nagata <nagata@sraoss.co.jp> >>
On Tue, 17 Sep 2019 12:03:20 -0600 Paul Draper <paulddraper@gmail.com> wrote: > Have you had any thoughts for more than two joined tables? > > Either there needs to be an quadratic number of joins, or intermediate join > results need to be stored and reused. I don't think that we need to store intermediate join results. Suppose that we have a view V joining table R,S, and new tuples are inserted to each table, dR,dS, and dT respectively. V = R*S*T R_new = R + dR S_new = S + dS T_new = T + dT In this situation, we can calculate the new view state as bellow. V_new = R_new * S_new * T_new = (R + dR) * (S + dS) * (T + dT) = R*S*T + dR*(S + dS)*(T + dT) + R*dS*(T + dT) + R*S*dT = V + dR*(S + dS)*(T + dT) + R*dS*(T + dT) + R*S*dT Although the number of terms is 2^3(=8), if we can use both of pre-update state (eg. R) and post-update state (eg. R+dR), we only need only three joins. Actually, post-update state is available in AFTER trigger, and pre-update state can be calculated by using delta tables (transition tables) and cmin/xmin system columns (or snapshot). This is the approach my implementation adopts. > > On Tue, Sep 17, 2019 at 8:50 AM Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > Hi Paul, > > > > Thank you for your suggestion. > > > > On Sun, 15 Sep 2019 11:52:22 -0600 > > Paul Draper <paulddraper@gmail.com> wrote: > > > > > As I understand it, the current patch performs immediate IVM using AFTER > > > STATEMENT trigger transition tables. > > > > > > However, multiple tables can be modified *before* AFTER STATEMENT > > triggers > > > are fired. > > > > > > CREATE TABLE example1 (a int); > > > CREATE TABLE example2 (a int); > > > > > > CREATE INCREMENTAL MATERIALIZED VIEW mv AS > > > SELECT example1.a, example2.a > > > FROM example1 JOIN example2 ON a; > > > > > > WITH > > > insert1 AS (INSERT INTO example1 VALUES (1)), > > > insert2 AS (INSERT INTO example2 VALUES (1)) > > > SELECT NULL; > > > > > > Changes to example1 are visible in an AFTER STATEMENT trigger on > > example2, > > > and vice versa. Would this not result in the (1, 1) tuple being > > > "double-counted"? > > > > > > IVM needs to either: > > > > > > (1) Evaluate deltas "serially' (e.g. EACH ROW triggers) > > > > > > (2) Have simultaneous access to multiple deltas: > > > delta_mv = example1 x delta_example2 + example2 x delta_example1 - > > > delta_example1 x delta_example2 > > > > > > This latter method is the "logged" approach that has been discussed for > > > deferred evaluation. > > > > > > tl;dr It seems that AFTER STATEMENT triggers required a deferred-like > > > implementation anyway. > > > > You are right, the latest patch doesn't support the situation where > > multiple tables are modified in a query. I noticed this when working > > on self-join, which also virtually need to handle multiple table > > modification. > > > > I am now working on this issue and the next patch will enable to handle > > this situation. I plan to submit the patch during this month. Roughly > > speaking, in the new implementation, AFTER STATEMENT triggers are used to > > collect information of modified table and its changes (= transition > > tables), > > and then the only last trigger updates the view. This will avoid the > > double-counting. I think this implementation also would be a base of > > deferred approach implementation in future where "logs" are used instead > > of transition tables. > > > > Regards, > > Yugo Nagata > > > > -- > > Yugo Nagata <nagata@sraoss.co.jp> > > -- Yugo Nagata <nagata@sraoss.co.jp>
Hi, Attached is the latest patch for supporting self-join views. This also including the following fix mentioned by Tatsuo Ishii. > > On 2019-Aug-06, Tatsuo Ishii wrote: > > > >> It's not mentioned below but some bugs including seg fault when > >> --enable-casser is enabled was also fixed in this patch. > >> > >> BTW, I found a bug with min/max support in this patch and I believe > >> Yugo is working on it. Details: > >> https://github.com/sraoss/pgsql-ivm/issues/20 This patch allows to support self-join views, simultaneous updates of more than one base tables, and also multiple updates of the same base table. I first tried to support just self-join, but I found that this is essentially same as to support simultaneous table updates, so I decided to support them in the same commit. I think this will be a base for implementing Deferred-maintenance in future. In the new implementation, AFTER triggers are used to collecting tuplestores containing transition table contents. When multiple tables are changed, multiple AFTER triggers are invoked, then the final AFTER trigger performs actual update of the matview. In addition AFTER trigger, also BEFORE trigger is used to handle global information for view maintenance. For example, suppose that we have a view V joining table R,S, and new tuples are inserted to each table, dR,dS, and dT respectively. V = R*S*T R_new = R + dR S_new = S + dS T_new = T + dT In this situation, we can calculate the new view state as bellow. V_new = R_new * S_new * T_new = (R + dR) * (S + dS) * (T + dT) = R*S*T + dR*(S + dS)*(T + dT) + R*dS*(T + dT) + R*S*dT = V + dR*(S + dS)*(T + dT) + R*dS*(T + dT) + R*S*dT = V + (dR *S_new*T_new) + (R*dS*T_new) + (R*S*dT) To calculate view deltas, we need both pre-state (R,S, and T) and post-state (R_new, S_new, and T_new) of base tables. Post-update states are available in AFTER trigger, and we calculate pre-update states by filtering inserted tuples using cmin/xmin system columns, and appendding deleted tuples which are contained in a old transition table. In the original core implementation, tuplestores of transition tables were freed for each query depth. However, we want to prolong their life plan because we have to preserve these for a whole query assuming some base tables are changed in other trigger functions, so I added a hack to trigger.c. Regression tests are also added for self join view, multiple change on the same table, simultaneous two table changes, and foreign reference constrains. Here are behavior examples: 1. Table definition - t: for self-join - r,s: for 2-ways join CREATE TABLE r (i int, v int); CREATE TABLE CREATE TABLE s (i int, v int); CREATE TABLE CREATE TABLE t (i int, v int); CREATE TABLE 2. Initial data INSERT INTO r VALUES (1, 10), (2, 20), (3, 30); INSERT 0 3 INSERT INTO s VALUES (1, 100), (2, 200), (3, 300); INSERT 0 3 INSERT INTO t VALUES (1, 10), (2, 20), (3, 30); INSERT 0 3 3. View definition 3.1. self-join(mv_self, v_slef) CREATE INCREMENTAL MATERIALIZED VIEW mv_self(v1, v2) AS SELECT t1.v, t2.v FROM t t1 JOIN t t2 ON t1.i = t2.i; SELECT 3 CREATE VIEW v_self(v1, v2) AS SELECT t1.v, t2.v FROM t t1 JOIN t t2 ON t1.i = t2.i; CREATE VIEW 3.2. 2-ways join (mv, v) CREATE INCREMENTAL MATERIALIZED VIEW mv(v1, v2) AS SELECT r.v, s.v FROM r JOIN s USING(i); SELECT 3 CREATE VIEW v(v1, v2) AS SELECT r.v, s.v FROM r JOIN s USING(i); CREATE VIEW 3.3 Initial contents SELECT * FROM mv_self ORDER BY v1; v1 | v2 ----+---- 10 | 10 20 | 20 30 | 30 (3 rows) SELECT * FROM mv ORDER BY v1; v1 | v2 ----+----- 10 | 100 20 | 200 30 | 300 (3 rows) 4. Update a base table for the self-join view INSERT INTO t VALUES (4,40); INSERT 0 1 DELETE FROM t WHERE i = 1; DELETE 1 UPDATE t SET v = v*10 WHERE i=2; UPDATE 1 4.1. Results - Comparison with the normal view SELECT * FROM mv_self ORDER BY v1; v1 | v2 -----+----- 30 | 30 40 | 40 200 | 200 (3 rows) SELECT * FROM v_self ORDER BY v1; v1 | v2 -----+----- 30 | 30 40 | 40 200 | 200 (3 rows) 5. pdate a base table for the 2-way join view WITH ins_r AS (INSERT INTO r VALUES (1,11) RETURNING 1), ins_r2 AS (INSERT INTO r VALUES (3,33) RETURNING 1), ins_s AS (INSERT INTO s VALUES (2,222) RETURNING 1), upd_r AS (UPDATE r SET v = v + 1000 WHERE i = 2 RETURNING 1), dlt_s AS (DELETE FROM s WHERE i = 3 RETURNING 1) SELECT NULL; ?column? ---------- (1 row) 5.1. Results - Comparison with the normal view SELECT * FROM mv ORDER BY v1; v1 | v2 ------+----- 10 | 100 11 | 100 1020 | 200 1020 | 222 (4 rows) SELECT * FROM v ORDER BY v1; v1 | v2 ------+----- 10 | 100 11 | 100 1020 | 200 1020 | 222 (4 rows) ======== Best Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
Attachment
Attached is the latest patch to add support for Incremental Materialized View Maintenance (IVM). IVM allows to reflect modifications made on base tables immediately to the target materialized views. Up to now, IVM supports materialized views using: - Inner joins - Some aggregate functions (count, sum, min, max, avg) - GROUP BY - Self joins With the latest patch now IVM supports subqueries in addition to above. Known limitations are listed here: https://github.com/sraoss/pgsql-ivm/issues See more details at: https://wiki.postgresql.org/wiki/Incremental_View_Maintenance About subquery support: The patch supports simple subqueries using EXISTS: CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_exists_subquery AS SELECT a.i, a.j FROM mv_base_a a WHERE EXISTS(SELECT 1 FROM mv_base_b b WHERE a.i = b.i); and subqueries in the FROM clause: CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_subquery AS SELECT a.i,a.j FROM mv_base_a a,( SELECT * FROM mv_base_b) b WHERE a.i = b.i; Other form of subqueries such as below are not supported: -- WHERE IN .. (subquery) is not supported CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm03 AS SELECT i,j FROM mv_base_a WHERE i IN (SELECT i FROM mv_base_b WHERE k < 103 ); -- subqueries in target list is not supported CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm05 AS SELECT i,j, (SELECT k FROM mv_base_b b WHERE a.i = b.i) FROM mv_base_a a; -- nested EXISTS subqueries is not supported CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm11 AS SELECT a.i,a.j FROM mv_base_a a WHERE EXISTS(SELECT 1 FROM mv_base_b b WHERE EXISTS(SELECT 1 FROM mv_base_b c WHERE b.i = c.i)); -- EXISTS subquery with aggragate function is not supported CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_exists AS SELECT COUNT(*) FROM mv_base_a a WHERE EXISTS(SELECT 1 FROM mv_base_b b WHERE a.i = b.i) OR a.i > 5; -- EXISTS subquery with condition except AND is not supported. CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm10 AS SELECT a.i,a.j FROM mv_base_a a WHERE EXISTS(SELECT 1 FROM mv_base_b b WHERE a.i = b.i) OR a.i > 5; This work has been done by Yugo Nagata (nagata@sraoss.co.jp), Takuma Hoshiai (hoshiai@sraoss.co.jp). Adding support for EXISTS clause has been done by Takuma. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Attachment
Hi, Attached is the latest patch (v8) to add support for Incremental View Maintenance (IVM). This patch adds OUTER join support in addition to the patch (v7) submitted last week in the following post. On Fri, 22 Nov 2019 15:29:45 +0900 (JST) Tatsuo Ishii <ishii@sraoss.co.jp> wrote: > Up to now, IVM supports materialized views using: > > - Inner joins > - Some aggregate functions (count, sum, min, max, avg) > - GROUP BY > - Self joins > > With the latest patch now IVM supports subqueries in addition to > above. > > Known limitations are listed here: > > https://github.com/sraoss/pgsql-ivm/issues > > See more details at: > https://wiki.postgresql.org/wiki/Incremental_View_Maintenance * About outer join support: In case of outer-join, when a table is modified, in addition to deltas which occur in inner-join case, we also need to deletion or insertion of dangling tuples, that is, null-extended tuples generated when a join condition isn't met. [Example] --------------------------------------------- -- Create base tables and an outer join view CREATE TABLE r(i int); CREATE TABLE s(j int); INSERT INTO r VALUES (1); CREATE INCREMENTAL MATERIALIZED VIEW mv AS SELECT * FROM r LEFT JOIN s ON r.i=s.j; SELECT * FROM mv; i | j ---+--- (1 row) -- After an insertion to a base table ... INSERT INTO s VALUES (1); -- (1,1) is inserted and (1,null) is deleted from the view. SELECT * FROM mv; i | j ---+--- 1 | 1 (1 row) --------------------------------------------- Our implementation is basically based on the algorithm of Larson & Zhou (2007) [1]. Before view maintenances, the view definition query's jointree is analysed to make "view maintenance graph". This graph represents which tuples in the views are affected when a base table is modified. Specifically, tuples which are not null-extended on the modified table (that is, tuples generated by joins with the modiifed table) are directly affected. The delta of such effects are calculated similarly to inner-joins. On the other hand, dangling tuples generated by anti-joins with directly affected tuples can be indirectly affected. This means that we may need to delete dangling tuples when any tuples are inserted to a table, as well as to insert dangling tuples when tuples are deleted from a table. [1] Efficient Maintenance of Materialized Outer-Join Views (Larson & Zhou, 2007) https://ieeexplore.ieee.org/document/4221654 Although the original paper assumes that every base table and view have a unique key and tuple duplicates is disallowed, we allow this. If a view has tuple duplicates, we have to determine the number of each dangling tuple to be inserted into the view when tuples in a table are deleted. For this purpose, we count the number of each tuples which constitute a deleted tuple. These counts are stored as JSONB object in the delta table, and we use this information to maintain outer-join views. Also, we support outer self-joins that is not assumed in the original paper. * Restrictions Currently, we have following restrictions: - outer join view's targetlist must contain attributes used in join conditions - outer join view's targetlist cannot contain non-strict functions - outer join supports only simple equijoin - outer join view's WHERE clause cannot contain non null-rejecting predicates - aggregate is not supported with outer join - subquery (including EXSITS) is not supported with outer join Regression tests for all patterns of 3-way outer join and are added. Moreover, I reordered IVM related functions in matview.c so that ones which have relationship will be located closely. Moreover, I added more comments. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
Attachment
Note that this is the last patch in the series of IVM patches: now we would like focus on blushing up the patches, rather than adding new SQL support to IVM, so that the patch is merged into PostgreSQL 13 (hopefully). We are very welcome reviews, comments on the patch. BTW, the SGML docs in the patch is very poor at this point. I am going to add more descriptions to the doc. > Hi, > > Attached is the latest patch (v8) to add support for Incremental View > Maintenance (IVM). This patch adds OUTER join support in addition > to the patch (v7) submitted last week in the following post. > > On Fri, 22 Nov 2019 15:29:45 +0900 (JST) > Tatsuo Ishii <ishii@sraoss.co.jp> wrote: >> Up to now, IVM supports materialized views using: >> >> - Inner joins >> - Some aggregate functions (count, sum, min, max, avg) >> - GROUP BY >> - Self joins >> >> With the latest patch now IVM supports subqueries in addition to >> above. >> >> Known limitations are listed here: >> >> https://github.com/sraoss/pgsql-ivm/issues >> >> See more details at: >> https://wiki.postgresql.org/wiki/Incremental_View_Maintenance > > * About outer join support: > > In case of outer-join, when a table is modified, in addition to deltas > which occur in inner-join case, we also need to deletion or insertion of > dangling tuples, that is, null-extended tuples generated when a join > condition isn't met. > > [Example] > --------------------------------------------- > -- Create base tables and an outer join view > CREATE TABLE r(i int); > CREATE TABLE s(j int); > INSERT INTO r VALUES (1); > CREATE INCREMENTAL MATERIALIZED VIEW mv > AS SELECT * FROM r LEFT JOIN s ON r.i=s.j; > SELECT * FROM mv; > i | j > ---+--- > (1 row) > > -- After an insertion to a base table ... > INSERT INTO s VALUES (1); > -- (1,1) is inserted and (1,null) is deleted from the view. > SELECT * FROM mv; > i | j > ---+--- > 1 | 1 > (1 row) > --------------------------------------------- > > Our implementation is basically based on the algorithm of Larson & Zhou > (2007) [1]. Before view maintenances, the view definition query's jointree > is analysed to make "view maintenance graph". This graph represents > which tuples in the views are affected when a base table is modified. > Specifically, tuples which are not null-extended on the modified table > (that is, tuples generated by joins with the modiifed table) are directly > affected. The delta of such effects are calculated similarly to inner-joins. > > On the other hand, dangling tuples generated by anti-joins with directly > affected tuples can be indirectly affected. This means that we may need to > delete dangling tuples when any tuples are inserted to a table, as well as > to insert dangling tuples when tuples are deleted from a table. > > [1] Efficient Maintenance of Materialized Outer-Join Views (Larson & Zhou, 2007) > https://ieeexplore.ieee.org/document/4221654 > > Although the original paper assumes that every base table and view have a > unique key and tuple duplicates is disallowed, we allow this. If a view has > tuple duplicates, we have to determine the number of each dangling tuple to > be inserted into the view when tuples in a table are deleted. For this purpose, > we count the number of each tuples which constitute a deleted tuple. These > counts are stored as JSONB object in the delta table, and we use this > information to maintain outer-join views. Also, we support outer self-joins > that is not assumed in the original paper. > > * Restrictions > > Currently, we have following restrictions: > > - outer join view's targetlist must contain attributes used in join conditions > - outer join view's targetlist cannot contain non-strict functions > - outer join supports only simple equijoin > - outer join view's WHERE clause cannot contain non null-rejecting predicates > - aggregate is not supported with outer join > - subquery (including EXSITS) is not supported with outer join > > > Regression tests for all patterns of 3-way outer join and are added. > > Moreover, I reordered IVM related functions in matview.c so that ones > which have relationship will be located closely. Moreover, I added more > comments. > > Regards, > Yugo Nagata > > > -- > Yugo Nagata <nagata@sraoss.co.jp>
> Note that this is the last patch in the series of IVM patches: now we > would like focus on blushing up the patches, rather than adding new > SQL support to IVM, so that the patch is merged into PostgreSQL 13 > (hopefully). We are very welcome reviews, comments on the patch. > > BTW, the SGML docs in the patch is very poor at this point. I am going > to add more descriptions to the doc. As promised, I have created the doc (CREATE MATERIALIZED VIEW manual) patch. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp diff --git a/doc/src/sgml/ref/create_materialized_view.sgml b/doc/src/sgml/ref/create_materialized_view.sgml index 964c9abbf7..92f5668771 100644 --- a/doc/src/sgml/ref/create_materialized_view.sgml +++ b/doc/src/sgml/ref/create_materialized_view.sgml @@ -62,36 +62,167 @@ CREATE [ INCREMENTAL ] MATERIALIZED VIEW [ IF NOT EXISTS ] <replaceable>table_na of the materialized view are immediately updated when base tables of the materialized view are updated. In general, this allows faster update of the materialized view at a price of slower update of the base tables - because the triggers will be invoked. + because the triggers will be invoked. We call this form of materialized + view as "Incremantal materialized View Maintenance" (IVM). </para> <para> There are restrictions of query definitions allowed to use this - option. Followings are allowed query definitions: + option. Followings are supported query definitions for IVM: <itemizedlist> + <listitem> <para> Inner joins (including self-joins). </para> </listitem> + <listitem> <para> - Some of aggregations (count, sum, avg, min, max) without HAVING clause. + Outer joins with following restrictions: + + <itemizedlist> + <listitem> + <para> + Outer join view's targetlist must contain attributes used in the + join conditions. + <programlisting> +CREATE INCREMENTAL MATERIALIZED VIEW mv AS SELECT a.i FROM mv_base_a a LEFT +JOIN mv_base_b b ON a.i=b.i; +ERROR: targetlist must contain vars in the join condition for IVM with outer join + </programlisting> + </para> + </listitem> + + <listitem> + <para> + Outer join view's targetlist cannot contain non strict functions. + <programlisting> +CREATE INCREMENTAL MATERIALIZED VIEW mv AS SELECT a.i, b.i, (k > 10 OR k = -1) +FROM mv_base_a a LEFT JOIN mv_base_b b ON a.i=b.i; +ERROR: targetlist cannot contain non strict functions for IVM with outer join + </programlisting> + </para> + </listitem> + + <listitem> + <para> + Outer join supports only simple equijoin. + <programlisting> +CREATE INCREMENTAL MATERIALIZED VIEW mv(a,b) AS SELECT a.i, b.i FROM mv_base_a +a LEFT JOIN mv_base_b b ON a.i>b.i; +ERROR: Only simple equijoin is supported for IVM with outer join +CREATE INCREMENTAL MATERIALIZED VIEW mv(a,b,k,j) AS SELECT a.i, b.i, k j FROM +mv_base_a a LEFT JOIN mv_base_b b ON a.i=b.i AND k=j; +ERROR: Only simple equijoin is supported for IVM with outer join + </programlisting> + </para> + </listitem> + + <listitem> + <para> + Outer join view's WHERE clause cannot contain non null-rejecting + predicates. + <programlisting> +CREATE INCREMENTAL MATERIALIZED VIEW mv(a,b) AS SELECT a.i, b.i FROM mv_base_a +a LEFT JOIN mv_base_b b ON a.i=b.i WHERE k IS NULL; +ERROR: WHERE cannot contain non null-rejecting predicates for IVM with outer join +CREATE INCREMENTAL MATERIALIZED VIEW mv(a,b) AS SELECT a.i, b.i FROM mv_base_a +a LEFT JOIN mv_base_b b ON a.i=b.i WHERE (k > 10 OR k = -1); +ERROR: WHERE cannot contain non null-rejecting predicates for IVM with outer join + </programlisting> + </para> + </listitem> + + <listitem> + <para> + Aggregate is not supported with outer join. + <programlisting> +CREATE INCREMENTAL MATERIALIZED VIEW mv(a,b,v) AS SELECT a.i, b.i, sum(k) FROM +mv_base_a a LEFT JOIN mv_base_b b ON a.i=b.i GROUP BY a.i, b.i; +ERROR: aggregate is not supported with IVM together with outer join +</programlisting> + </para> + </listitem> + + <listitem> + <para> + Subquery is not supported with outer join. + <programlisting> +CREATE INCREMENTAL MATERIALIZED VIEW mv(a,b) AS SELECT a.i, b.i FROM mv_base_a +a LEFT JOIN (SELECT * FROM mv_base_b) b ON a.i=b.i; +ERROR: subquery is not supported with IVM together with outer join +CREATE INCREMENTAL MATERIALIZED VIEW mv(a,b) AS SELECT a.i, b.i FROM mv_base_a +a LEFT JOIN mv_base_b b ON a.i=b.i WHERE EXISTS (SELECT 1 FROM mv_base_b b2 +WHERE a.j = b.k); +ERROR: subquery is not supported by IVM together with outer join + </programlisting> + </para> + </listitem> + </itemizedlist> </para> - </listitem> - </itemizedlist> - Prohibited queries with this option include followings: - <itemizedlist> + </listitem> + <listitem> <para> - Outer joins. + Subqueries. However following forms are not supported. </para> - </listitem> + + <para> + WHERE IN .. (subquery) is not supported: + <programlisting> +CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm03 AS SELECT i,j FROM +mv_base_a WHERE i IN (SELECT i FROM mv_base_b WHERE k < 103 ); + </programlisting> + </para> + <para> + subqueries in target list is not supported: + <programlisting> +CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm05 AS SELECT i,j, (SELECT k +FROM mv_base_b b WHERE a.i = b.i) FROM mv_base_a a; + </programlisting> + </para> + <para> + Nested EXISTS subqueries is not supported: + <programlisting> +CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm11 AS SELECT a.i,a.j FROM +mv_base_a a WHERE EXISTS(SELECT 1 FROM mv_base_b b WHERE EXISTS(SELECT +1 FROM mv_base_b c WHERE b.i = c.i)); + </programlisting> + </para> + <para> + EXISTS subquery with aggregate function is not supported: + <programlisting> +CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_exists AS SELECT COUNT(*) +FROM mv_base_a a WHERE EXISTS(SELECT 1 FROM mv_base_b b WHERE a.i = +b.i) OR a.i > 5; + </programlisting> + </para> + <para> + EXISTS subquery with condition other than AND is not supported: + <programlisting> +CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm10 AS SELECT a.i,a.j FROM +mv_base_a a WHERE EXISTS(SELECT 1 FROM mv_base_b b WHERE a.i = b.i) OR +a.i > 5; + </programlisting> + </para> + </listitem> + <listitem> <para> - Subqueries. + Some of aggregations (count, sum, avg, min, max) without HAVING + clause. However, aggregate functions in subquery is not supported: + <programlisting> +CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm09 AS SELECT a.i,a.j FROM mv_base_a +a, (SELECT i, COUNT(*) FROM mv_base_b GROUP BY i) b WHERE a.i = b.i; + </programlisting> </para> </listitem> + </itemizedlist> + + Unsupported queries with this option include followings: + + <itemizedlist> <listitem> <para> Aggregations other than count, sum, avg, min and max. @@ -111,24 +242,50 @@ CREATE [ INCREMENTAL ] MATERIALIZED VIEW [ IF NOT EXISTS ] <replaceable>table_na Other restrictions include: <itemizedlist> + <listitem> <para> - Incremental materialized views must be based on simple base - tables. Views or materialized views are not allowed to create - incremental materialized views. + IVMs must be based on simple base tables. Views or materialized views + are not allowed to create IVM on them. </para> </listitem> + + <listitem> + <para> + <command>pg_dump</command> and <command>pg_restore</command> do not + support IVMs. IVMs are dumped as ordinary materialized views. + </para> + </listitem> + + <listitem> + <para> + <command>REFRESH MATERIALIZED VIEW</command> does not support IVMs. + </para> + </listitem> + <listitem> <para> When TRUNCATE command is executed on a base table, nothing occurs and this is not applied to the materialized view. </para> </listitem> + + <listitem> + <para> + IVM including system columns is not supported. + <programlisting> +CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm02 AS SELECT i,j FROM mv_base_a WHERE xmin = '610'; +ERROR: system column is not supported with IVM + </programlisting> + </para> + </listitem> + <listitem> <para> - Incremental materialized views are not supported by logical replication. + IVMs not supported by logical replication. </para> </listitem> + </itemizedlist> </para>
On Thu, 28 Nov 2019 11:26:40 +0900 (JST) Tatsuo Ishii <ishii@sraoss.co.jp> wrote: > > Note that this is the last patch in the series of IVM patches: now we > > would like focus on blushing up the patches, rather than adding new > > SQL support to IVM, so that the patch is merged into PostgreSQL 13 > > (hopefully). We are very welcome reviews, comments on the patch. > > > > BTW, the SGML docs in the patch is very poor at this point. I am going > > to add more descriptions to the doc. > > As promised, I have created the doc (CREATE MATERIALIZED VIEW manual) > patch. - because the triggers will be invoked. + because the triggers will be invoked. We call this form of materialized + view as "Incremantal materialized View Maintenance" (IVM). This part seems incorrect to me. Incremental (materialized) View Maintenance (IVM) is a way to maintain materialized views and is not a word to refer views to be maintained. However, it would be useful if there is a term referring views which can be maintained using IVM. Off the top of my head, we can call this Incrementally Maintainable Views (= IMVs), but this might cofusable with IVM, so I'll think about that a little more.... Regards, Yugo Nagata > > Best regards, > -- > Tatsuo Ishii > SRA OSS, Inc. Japan > English: http://www.sraoss.co.jp/index_en.php > Japanese:http://www.sraoss.co.jp -- Yugo Nagata <nagata@sraoss.co.jp>
One thing pending in this development line is how to catalogue aggregate functions that can be used in incrementally-maintainable views. I saw a brief mention somewhere that the devels knew it needed to be done, but I don't see in the thread that they got around to doing it. Did you guys have any thoughts on how it can be represented in catalogs? It seems sine-qua-non ... -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>> As promised, I have created the doc (CREATE MATERIALIZED VIEW manual) >> patch. > > - because the triggers will be invoked. > + because the triggers will be invoked. We call this form of materialized > + view as "Incremantal materialized View Maintenance" (IVM). > > This part seems incorrect to me. Incremental (materialized) View > Maintenance (IVM) is a way to maintain materialized views and is not a > word to refer views to be maintained. > > However, it would be useful if there is a term referring views which > can be maintained using IVM. Off the top of my head, we can call this > Incrementally Maintainable Views (= IMVs), but this might cofusable with > IVM, so I'll think about that a little more.... But if we introduce IMV, IVM would be used in much less places in the doc and source code, so less confusion would happen, I guess. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
> Hi, > > Attached is the latest patch (v8) to add support for Incremental View > Maintenance (IVM). This patch adds OUTER join support in addition > to the patch (v7) submitted last week in the following post. There's a compiler warning: matview.c: In function ‘getRteListCell’: matview.c:2685:9: warning: ‘rte_lc’ may be used uninitialized in this function [-Wmaybe-uninitialized] return rte_lc; ^~~~~~ Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On Fri, 29 Nov 2019 09:50:49 +0900 (JST) Tatsuo Ishii <ishii@sraoss.co.jp> wrote: > > Hi, > > > > Attached is the latest patch (v8) to add support for Incremental View > > Maintenance (IVM). This patch adds OUTER join support in addition > > to the patch (v7) submitted last week in the following post. > > There's a compiler warning: > > matview.c: In function ‘getRteListCell’: > matview.c:2685:9: warning: ‘rte_lc’ may be used uninitialized in this function [-Wmaybe-uninitialized] > return rte_lc; > ^~~~~~ Thanks! I'll fix this. > > Best regards, > -- > Tatsuo Ishii > SRA OSS, Inc. Japan > English: http://www.sraoss.co.jp/index_en.php > Japanese:http://www.sraoss.co.jp -- Yugo Nagata <nagata@sraoss.co.jp>
On Fri, 29 Nov 2019 07:19:44 +0900 (JST) Tatsuo Ishii <ishii@sraoss.co.jp> wrote: > >> As promised, I have created the doc (CREATE MATERIALIZED VIEW manual) > >> patch. > > > > - because the triggers will be invoked. > > + because the triggers will be invoked. We call this form of materialized > > + view as "Incremantal materialized View Maintenance" (IVM). > > > > This part seems incorrect to me. Incremental (materialized) View > > Maintenance (IVM) is a way to maintain materialized views and is not a > > word to refer views to be maintained. > > > > However, it would be useful if there is a term referring views which > > can be maintained using IVM. Off the top of my head, we can call this > > Incrementally Maintainable Views (= IMVs), but this might cofusable with > > IVM, so I'll think about that a little more.... > > But if we introduce IMV, IVM would be used in much less places in the > doc and source code, so less confusion would happen, I guess. Make senses. However, we came to think that "Incrementally Maintainable Materialized Views" (IMMs) would be good. So, how about using this for now? When other better opinions are raised, let's discuss again Regards, Yugo Nagata > > Best regards, > -- > Tatsuo Ishii > SRA OSS, Inc. Japan > English: http://www.sraoss.co.jp/index_en.php > Japanese:http://www.sraoss.co.jp -- Yugo Nagata <nagata@sraoss.co.jp>
>> But if we introduce IMV, IVM would be used in much less places in the >> doc and source code, so less confusion would happen, I guess. > > Make senses. However, we came to think that "Incrementally Maintainable > Materialized Views" (IMMs) would be good. So, how about using this for now? > When other better opinions are raised, let's discuss again Sounds good to me. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Hello, Thanks a lot for working on this. It's a great (and big!) feature and I can see that a lot of work has been put into writing this patch. I started looking at the patch (v8), but as it's quite big: 34 files changed, 5444 insertions(+), 69 deletions(-) I'm having a bit of trouble reading through, which I suspect others may be too. Perhaps, it can be easier for you, as authors, to know everything that's being changed (added, removed, existing code rewritten), but certainly not for a reviewer, so I think it would be a good idea to try to think dividing this into parts. I still don't have my head wrapped around the topic of materialized view maintenance, but roughly it looks to me like there are really *two* features that are being added: 1. Add a new method to refresh an MV incrementally; IIUC, there's already one method that's used by REFRESH MATERIALIZED VIEW CONCURRENTLY, correct? 2. Make the refresh automatic (using triggers on the component tables) Maybe, there are even: 0. Infrastructure additions As you can tell, having the patch broken down like this would allow us to focus on the finer aspects of each of the problem being solved and solution being adopted, for example: * It would be easier for someone having an expert opinion on how to implement incremental refresh to have to only look at the patch for (1). If the new method handles more query types than currently, which obviously means more code is needed, which in turn entails possibility of bugs, despite the best efforts. It would be better to get more eyeballs at this portion of the patch and having it isolated seems like a good way to attract more eyeballs. * Someone well versed in trigger infrastructure can help fine tune the patch for (2) and so on. So, please consider giving some thought to this. Thanks, Amit
The following review on our patch was posted on another thread, so I quote here. The tab completion is Hoshiai-san's work, so he will handle this issue. Regards, Yugo Nagata. On Thu, 28 Nov 2019 13:00:05 +0900 nuko yokohama <nuko.yokohama@gmail.com> wrote: > Hi. > > I'm using the "Incremental Materialized View Maintenance" patch and have > reported the following issues. > (https://commitfest.postgresql.org/25/2138/) > > To Suggest a "DROP INCREMENTAL MATERIALIZED VIEW" in psql, but the syntax > error when you run. > ("DROP MATERIALIZED VIEW" command can drop Incremental Materialozed view > normally.) > > > ramendb=# CREATE INCREMENTAL MATERIALIZED VIEW pref_count AS SELECT pref, > COUNT(pref) FROM shops GROUP BY pref; > SELECT 48 > ramendb=# \d pref_count > Materialized view "public.pref_count" > Column | Type | Collation | Nullable | Default > ---------------+--------+-----------+----------+--------- > pref | text | | | > count | bigint | | | > __ivm_count__ | bigint | | | > > ramendb=# DROP IN > INCREMENTAL MATERIALIZED VIEW INDEX > ramendb=# DROP INCREMENTAL MATERIALIZED VIEW pref_count; > 2019-11-27 11:51:03.916 UTC [9759] ERROR: syntax error at or near > "INCREMENTAL" at character 6 > 2019-11-27 11:51:03.916 UTC [9759] STATEMENT: DROP INCREMENTAL > MATERIALIZED VIEW pref_count; > ERROR: syntax error at or near "INCREMENTAL" > LINE 1: DROP INCREMENTAL MATERIALIZED VIEW pref_count; > ^ > ramendb=# DROP MATERIALIZED VIEW pref_count ; > DROP MATERIALIZED VIEW > ramendb=# > > > Regard. -- Yugo Nagata <nagata@sraoss.co.jp> -- Yugo Nagata <nagata@sraoss.co.jp>
On Fri, 29 Nov 2019 15:45:13 +0900 Yugo Nagata <nagata@sraoss.co.jp> wrote: > The following review on our patch was posted on another thread, > so I quote here. The tab completion is Hoshiai-san's work, so > he will handle this issue. > > Regards, > Yugo Nagata. > > On Thu, 28 Nov 2019 13:00:05 +0900 > nuko yokohama <nuko.yokohama@gmail.com> wrote: > > > Hi. > > > > I'm using the "Incremental Materialized View Maintenance" patch and have > > reported the following issues. > > (https://commitfest.postgresql.org/25/2138/) > > > > To Suggest a "DROP INCREMENTAL MATERIALIZED VIEW" in psql, but the syntax > > error when you run. > > ("DROP MATERIALIZED VIEW" command can drop Incremental Materialozed view > > normally.) Thank you for your review. This psql's suggestion is mistake, "INCREMENTAL MATERIALIZED" phrase is only used for CREATE statement. I will fix it as the following: diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c index 2051bc3..8c4b211 100644 --- a/src/bin/psql/tab-complete.c +++ b/src/bin/psql/tab-complete.c @@ -1001,7 +1001,7 @@ static const pgsql_thing_t words_after_create[] = { {"FOREIGN TABLE", NULL, NULL, NULL}, {"FUNCTION", NULL, NULL, Query_for_list_of_functions}, {"GROUP", Query_for_list_of_roles}, - {"INCREMENTAL MATERIALIZED VIEW", NULL, NULL, &Query_for_list_of_matviews}, + {"INCREMENTAL MATERIALIZED VIEW", NULL, NULL, &Query_for_list_of_matviews, THING_NO_DROP | THING_NO_ALTER}, {"INDEX", NULL, NULL, &Query_for_list_of_indexes}, {"LANGUAGE", Query_for_list_of_languages}, {"LARGE OBJECT", NULL, NULL, NULL, THING_NO_CREATE | THING_NO_DROP}, Best Regards, Takuma Hoshiai > > ramendb=# CREATE INCREMENTAL MATERIALIZED VIEW pref_count AS SELECT pref, > > COUNT(pref) FROM shops GROUP BY pref; > > SELECT 48 > > ramendb=# \d pref_count > > Materialized view "public.pref_count" > > Column | Type | Collation | Nullable | Default > > ---------------+--------+-----------+----------+--------- > > pref | text | | | > > count | bigint | | | > > __ivm_count__ | bigint | | | > > > > ramendb=# DROP IN > > INCREMENTAL MATERIALIZED VIEW INDEX > > ramendb=# DROP INCREMENTAL MATERIALIZED VIEW pref_count; > > 2019-11-27 11:51:03.916 UTC [9759] ERROR: syntax error at or near > > "INCREMENTAL" at character 6 > > 2019-11-27 11:51:03.916 UTC [9759] STATEMENT: DROP INCREMENTAL > > MATERIALIZED VIEW pref_count; > > ERROR: syntax error at or near "INCREMENTAL" > > LINE 1: DROP INCREMENTAL MATERIALIZED VIEW pref_count; > > ^ > > ramendb=# DROP MATERIALIZED VIEW pref_count ; > > DROP MATERIALIZED VIEW > > ramendb=# > > > > > > Regard. > > > -- > Yugo Nagata <nagata@sraoss.co.jp> > > > -- > Yugo Nagata <nagata@sraoss.co.jp> > -- Takuma Hoshiai <hoshiai@sraoss.co.jp>
On Thu, 28 Nov 2019 11:03:33 -0300 Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > One thing pending in this development line is how to catalogue aggregate > functions that can be used in incrementally-maintainable views. > I saw a brief mention somewhere that the devels knew it needed to be > done, but I don't see in the thread that they got around to doing it. > Did you guys have any thoughts on how it can be represented in catalogs? > It seems sine-qua-non ... Yes, this is a pending issue. Currently, supported aggregate functions are identified their name, that is, we support aggregate functions named "count", "sum", "avg", "min", or "max". As mentioned before, this is not robust because there might be user-defined aggregates with these names although all built-in aggregates can be used in IVM. In our implementation, the new aggregate values are calculated using "+" and "-" operations for sum and count, "/" for agv, and ">=" / "<=" for min/max. Therefore, if there is a user-defined aggregate on a user-defined type which doesn't support these operators, errors will raise. Obviously, this is a problem. Even if these operators are defined, the semantics of user-defined aggregate functions might not match with the way of maintaining views, and resultant might be incorrect. I think there are at least three options to prevent these problems. In the first option, we support only built-in aggregates which we know able to handle correctly. Supported aggregates can be identified using their OIDs. User-defined aggregates are not supported. I think this is the simplest and easiest way. Second, supported aggregates can be identified using name, like the current implementation, but also it is checked if required operators are defined. In this case, user-defined aggregates are allowed to some extent and we can prevent errors during IVM although aggregates value in view might be incorrect if the semantics doesn't match. Third, we can add a new attribute to pg_aggregate which shows if each aggregate can be used in IVM. We don't need to use names or OIDs list of supported aggregates although we need modification of the system catalogue. Regarding pg_aggregate, now we have aggcombinefn attribute for supporting partial aggregation. Maybe we could use combine functions to calculate new aggregate values in IVM when tuples are inserted into a table. However, in the context of IVM, we also need other function used when tuples are deleted from a table, so we can not use partial aggregation for IVM in the current implementation. This might be another option to implement "inverse combine function"(?) for IVM, but I am not sure it worth. Regards, Yugo Nagata > > -- > Álvaro Herrera https://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Yugo Nagata <nagata@sraoss.co.jp>
On Fri, 29 Nov 2019 15:34:52 +0900 Amit Langote <amitlangote09@gmail.com> wrote: > Thanks a lot for working on this. It's a great (and big!) feature and > I can see that a lot of work has been put into writing this patch. I > started looking at the patch (v8), but as it's quite big: > > 34 files changed, 5444 insertions(+), 69 deletions(-) Thank you for your reviewing the patch! Yes, this is a big patch athough > I'm having a bit of trouble reading through, which I suspect others > may be too. Perhaps, it can be easier for you, as authors, to know > everything that's being changed (added, removed, existing code > rewritten), but certainly not for a reviewer, so I think it would be a > good idea to try to think dividing this into parts. I still don't I agree with you. We also think the need to split the patch and we are considering the way. > have my head wrapped around the topic of materialized view > maintenance, but roughly it looks to me like there are really *two* > features that are being added: > > 1. Add a new method to refresh an MV incrementally; IIUC, there's > already one method that's used by REFRESH MATERIALIZED VIEW > CONCURRENTLY, correct? No, REFRESH MATERIALIZED VIEW CONCURRENTLY is not the way to refresh materialized views. This just acquires weaker locks on views to not prevent SELECT, so this calculate the content of the view completely from scratch. There is no method to incrementally refresh materialized views in the current PostgreSQL. Also, we didn't implement incremental refresh on REFRESH command in this patch. This supports only automatically refresh using triggers. However, we used the code for REFRESH in our IVM implementation, so I think splitting the patch according to this point of view can make sense. > 2. Make the refresh automatic (using triggers on the component tables) > > Maybe, there are even: > > 0. Infrastructure additions Yes, we have a bit modification on the infrastructure, for example, trigger.c. > As you can tell, having the patch broken down like this would allow us > to focus on the finer aspects of each of the problem being solved and > solution being adopted, for example: > > * It would be easier for someone having an expert opinion on how to > implement incremental refresh to have to only look at the patch for > (1). If the new method handles more query types than currently, which > obviously means more code is needed, which in turn entails possibility > of bugs, despite the best efforts. It would be better to get more > eyeballs at this portion of the patch and having it isolated seems > like a good way to attract more eyeballs. > > * Someone well versed in trigger infrastructure can help fine tune the > patch for (2) > > and so on. > > So, please consider giving some thought to this. Agreed. Although I am not sure we will do it as above way, we will consider to split the patch, anyway. Thanks. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
On Fri, 29 Nov 2019 18:16:00 +0900 Yugo Nagata <nagata@sraoss.co.jp> wrote: > On Fri, 29 Nov 2019 15:34:52 +0900 > Amit Langote <amitlangote09@gmail.com> wrote: > > > Thanks a lot for working on this. It's a great (and big!) feature and > > I can see that a lot of work has been put into writing this patch. I > > started looking at the patch (v8), but as it's quite big: > > > > 34 files changed, 5444 insertions(+), 69 deletions(-) > > Thank you for your reviewing the patch! Yes, this is a big patch > athough Sorry, an unfinished line was left... Please ignore this. -- Yugo Nagata <nagata@sraoss.co.jp>
On Fri, Nov 29, 2019 at 06:19:54PM +0900, Yugo Nagata wrote: > Sorry, an unfinished line was left... Please ignore this. A rebase looks to be necessary, Mr Robot complains that the patch does not apply cleanly. As the thread is active recently, I have moved the patch to next CF, waiting on author. -- Michael
Attachment
Michael, > A rebase looks to be necessary, Mr Robot complains that the patch does > not apply cleanly. As the thread is active recently, I have moved the > patch to next CF, waiting on author. Thank you for taking care of this patch. Hoshiai-san, can you please rebase the patch? Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
>> One thing pending in this development line is how to catalogue aggregate >> functions that can be used in incrementally-maintainable views. >> I saw a brief mention somewhere that the devels knew it needed to be >> done, but I don't see in the thread that they got around to doing it. >> Did you guys have any thoughts on how it can be represented in catalogs? >> It seems sine-qua-non ... > > Yes, this is a pending issue. Currently, supported aggregate functions are > identified their name, that is, we support aggregate functions named "count", > "sum", "avg", "min", or "max". As mentioned before, this is not robust > because there might be user-defined aggregates with these names although all > built-in aggregates can be used in IVM. > > In our implementation, the new aggregate values are calculated using "+" and > "-" operations for sum and count, "/" for agv, and ">=" / "<=" for min/max. > Therefore, if there is a user-defined aggregate on a user-defined type which > doesn't support these operators, errors will raise. Obviously, this is a > problem. Even if these operators are defined, the semantics of user-defined > aggregate functions might not match with the way of maintaining views, and > resultant might be incorrect. > > I think there are at least three options to prevent these problems. > > In the first option, we support only built-in aggregates which we know able > to handle correctly. Supported aggregates can be identified using their OIDs. > User-defined aggregates are not supported. I think this is the simplest and > easiest way. I think this is enough for the first cut of IVM. So +1. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
> On Fri, Nov 29, 2019 at 06:19:54PM +0900, Yugo Nagata wrote: >> Sorry, an unfinished line was left... Please ignore this. > > A rebase looks to be necessary, Mr Robot complains that the patch does > not apply cleanly. Is this because the patch has ".gz" suffix? Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On Mon, 02 Dec 2019 10:01:18 +0900 (JST) Tatsuo Ishii <ishii@sraoss.co.jp> wrote: > Michael, > > > A rebase looks to be necessary, Mr Robot complains that the patch does > > not apply cleanly. As the thread is active recently, I have moved the > > patch to next CF, waiting on author. > > Thank you for taking care of this patch. Hoshiai-san, can you please > rebase the patch? Sure, I re-created a patch. This contains 'IVM_8.patch' and 'create_materialized_view.patch', and I checked to apply a patch on latest master. > Best regards, > -- > Tatsuo Ishii > SRA OSS, Inc. Japan > English: http://www.sraoss.co.jp/index_en.php > Japanese:http://www.sraoss.co.jp > Best Regards, -- Takuma Hoshiai <hoshiai@sraoss.co.jp>
Attachment
On Mon, 02 Dec 2019 10:36:36 +0900 (JST) Tatsuo Ishii <ishii@sraoss.co.jp> wrote: > >> One thing pending in this development line is how to catalogue aggregate > >> functions that can be used in incrementally-maintainable views. > >> I saw a brief mention somewhere that the devels knew it needed to be > >> done, but I don't see in the thread that they got around to doing it. > >> Did you guys have any thoughts on how it can be represented in catalogs? > >> It seems sine-qua-non ... > > > > Yes, this is a pending issue. Currently, supported aggregate functions are > > identified their name, that is, we support aggregate functions named "count", > > "sum", "avg", "min", or "max". As mentioned before, this is not robust > > because there might be user-defined aggregates with these names although all > > built-in aggregates can be used in IVM. > > > > In our implementation, the new aggregate values are calculated using "+" and > > "-" operations for sum and count, "/" for agv, and ">=" / "<=" for min/max. > > Therefore, if there is a user-defined aggregate on a user-defined type which > > doesn't support these operators, errors will raise. Obviously, this is a > > problem. Even if these operators are defined, the semantics of user-defined > > aggregate functions might not match with the way of maintaining views, and > > resultant might be incorrect. > > > > I think there are at least three options to prevent these problems. > > > > In the first option, we support only built-in aggregates which we know able > > to handle correctly. Supported aggregates can be identified using their OIDs. > > User-defined aggregates are not supported. I think this is the simplest and > > easiest way. > > I think this is enough for the first cut of IVM. So +1. If there is no objection, I will add the check of aggregate functions by this way. Thanks. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
On 2019-Dec-02, Yugo Nagata wrote: > On Mon, 02 Dec 2019 10:36:36 +0900 (JST) > Tatsuo Ishii <ishii@sraoss.co.jp> wrote: > > > >> One thing pending in this development line is how to catalogue aggregate > > >> functions that can be used in incrementally-maintainable views. > > >> I saw a brief mention somewhere that the devels knew it needed to be > > >> done, but I don't see in the thread that they got around to doing it. > > >> Did you guys have any thoughts on how it can be represented in catalogs? > > >> It seems sine-qua-non ... > > > In the first option, we support only built-in aggregates which we know able > > > to handle correctly. Supported aggregates can be identified using their OIDs. > > > User-defined aggregates are not supported. I think this is the simplest and > > > easiest way. > > > > I think this is enough for the first cut of IVM. So +1. > > If there is no objection, I will add the check of aggregate functions > by this way. Thanks. The way I imagine things is that there's (one or more) new column in pg_aggregate that links to the operator(s) (or function(s)?) that support incremental update of the MV for that aggregate function. Is that what you're proposing? All that query-construction business in apply_delta() looks quite suspicious. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, 2 Dec 2019 13:48:40 -0300 Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > On 2019-Dec-02, Yugo Nagata wrote: > > > On Mon, 02 Dec 2019 10:36:36 +0900 (JST) > > Tatsuo Ishii <ishii@sraoss.co.jp> wrote: > > > > > >> One thing pending in this development line is how to catalogue aggregate > > > >> functions that can be used in incrementally-maintainable views. > > > >> I saw a brief mention somewhere that the devels knew it needed to be > > > >> done, but I don't see in the thread that they got around to doing it. > > > >> Did you guys have any thoughts on how it can be represented in catalogs? > > > >> It seems sine-qua-non ... > > > > > In the first option, we support only built-in aggregates which we know able > > > > to handle correctly. Supported aggregates can be identified using their OIDs. > > > > User-defined aggregates are not supported. I think this is the simplest and > > > > easiest way. > > > > > > I think this is enough for the first cut of IVM. So +1. > > > > If there is no objection, I will add the check of aggregate functions > > by this way. Thanks. > > The way I imagine things is that there's (one or more) new column in > pg_aggregate that links to the operator(s) (or function(s)?) that > support incremental update of the MV for that aggregate function. Is > that what you're proposing? The way I am proposing above is using OID to check if a aggregate can be used in IVM. This allows only a part of built-in aggreagete functions. This way you mentioned was proposed as one of options as following. On Fri, 29 Nov 2019 17:33:28 +0900 Yugo Nagata <nagata@sraoss.co.jp> wrote: > Third, we can add a new attribute to pg_aggregate which shows if each > aggregate can be used in IVM. We don't need to use names or OIDs list of > supported aggregates although we need modification of the system catalogue. > > Regarding pg_aggregate, now we have aggcombinefn attribute for supporting > partial aggregation. Maybe we could use combine functions to calculate new > aggregate values in IVM when tuples are inserted into a table. However, in > the context of IVM, we also need other function used when tuples are deleted > from a table, so we can not use partial aggregation for IVM in the current > implementation. This might be another option to implement "inverse combine > function"(?) for IVM, but I am not sure it worth. If we add "inverse combine function" in pg_aggregate that takes two results of aggregating over tuples in a view and tuples in a delta, and produces a result of aggregating over tuples in the view after tuples in the delta are deleted from this, it would allow to calculate new aggregate values in IVM using aggcombinefn together when the aggregate function provides both functions. Another idea is to use support functions for moving-aggregate mode which are already provided in pg_aggregate. However, in this case, we have to apply tuples in the delta to the view one by one instead of applying after aggregating tuples in the delta. In both case, we can not use these support functions in SQL via SPI because the type of some aggregates is internal. We have to alter the current apply_delta implementation if we adopt a way using these support functions. Instead, we also can add support functions for IVM independent to partial aggregate or moving-aggregate. Maybe this is also one of options. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
subsequent INSERT operations to the base table will fail.
Error message.
```
ERROR: could not open relation with OID 0
```
Execution log.
```
[ec2-user@ip-10-0-1-10 ivm]$ psql -U postgres test -e -f ~/test/ivm/alter_rename_bug.sql
DROP TABLE IF EXISTS table_x CASCADE;
psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:1: NOTICE: drop cascades to materialized view group_imv
DROP TABLE
CREATE TABLE table_x AS
SELECT generate_series(1, 10000) AS id,
ROUND(random()::numeric * 100, 2) AS data,
CASE (random() * 5)::integer
WHEN 4 THEN 'group-a'
WHEN 3 THEN 'group-b'
ELSE 'group-c'
END AS part_key
;
SELECT 10000
Table "public.table_x"
Column | Type | Collation | Nullable | Default
----------+---------+-----------+----------+---------
id | integer | | |
data | numeric | | |
part_key | text | | |
DROP MATERIALIZED VIEW IF EXISTS group_imv;
psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:15: NOTICE: materialized view "group_imv" does not exist, skipping
DROP MATERIALIZED VIEW
CREATE INCREMENTAL MATERIALIZED VIEW group_imv AS
SELECT part_key, COUNT(*), MAX(data), MIN(data), SUM(data), AVG(data)
FROM table_x
GROUP BY part_key;
SELECT 3
List of relations
Schema | Name | Type | Owner
--------+-----------+-------------------+----------
public | group_imv | materialized view | postgres
public | table_x | table | postgres
(2 rows)
Materialized view "public.group_imv"
Column | Type | Collation | Nullable | Default
-------------------+---------+-----------+----------+---------
part_key | text | | |
count | bigint | | |
max | numeric | | |
min | numeric | | |
sum | numeric | | |
avg | numeric | | |
__ivm_count_max__ | bigint | | |
__ivm_count_min__ | bigint | | |
__ivm_count_sum__ | bigint | | |
__ivm_count_avg__ | bigint | | |
__ivm_sum_avg__ | numeric | | |
__ivm_count__ | bigint | | |
SELECT * FROM group_imv ORDER BY part_key;
part_key | count | max | min | sum | avg
----------+-------+-------+------+-----------+---------------------
group-a | 1966 | 99.85 | 0.05 | 98634.93 | 50.1703611393692777
group-b | 2021 | 99.99 | 0.17 | 102614.02 | 50.7738842157347848
group-c | 6013 | 99.99 | 0.02 | 300968.43 | 50.0529569266589057
(3 rows)
ALTER MATERIALIZED VIEW group_imv RENAME TO group_imv2;
ALTER MATERIALIZED VIEW
List of relations
Schema | Name | Type | Owner
--------+------------+-------------------+----------
public | group_imv2 | materialized view | postgres
public | table_x | table | postgres
(2 rows)
Materialized view "public.group_imv2"
Column | Type | Collation | Nullable | Default
-------------------+---------+-----------+----------+---------
part_key | text | | |
count | bigint | | |
max | numeric | | |
min | numeric | | |
sum | numeric | | |
avg | numeric | | |
__ivm_count_max__ | bigint | | |
__ivm_count_min__ | bigint | | |
__ivm_count_sum__ | bigint | | |
__ivm_count_avg__ | bigint | | |
__ivm_sum_avg__ | numeric | | |
__ivm_count__ | bigint | | |
SET client_min_messages = debug5;
psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:30: DEBUG: CommitTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 0/1/0
SET
INSERT INTO table_x VALUES (10000001, ROUND(random()::numeric * 100, 2),
'gruop_d');
psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:33: DEBUG: StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:33: DEBUG: relation "public.group_imv" does not exist
psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:33: DEBUG: relation "public.group_imv" does not exist
psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:33: ERROR: could not open relation with OID 0
RESET client_min_messages;
psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:34: DEBUG: StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
RESET
SELECT * FROM group_imv2 ORDER BY part_key;
part_key | count | max | min | sum | avg
----------+-------+-------+------+-----------+---------------------
group-a | 1966 | 99.85 | 0.05 | 98634.93 | 50.1703611393692777
group-b | 2021 | 99.99 | 0.17 | 102614.02 | 50.7738842157347848
group-c | 6013 | 99.99 | 0.02 | 300968.43 | 50.0529569266589057
(3 rows)
ALTER MATERIALIZED VIEW group_imv2 RENAME TO group_imv;
ALTER MATERIALIZED VIEW
INSERT INTO table_x VALUES (10000001, ROUND(random()::numeric * 100, 2),
'gruop_d');
INSERT 0 1
SELECT * FROM group_imv ORDER BY part_key;
part_key | count | max | min | sum | avg
----------+-------+-------+-------+-----------+---------------------
group-a | 1966 | 99.85 | 0.05 | 98634.93 | 50.1703611393692777
group-b | 2021 | 99.99 | 0.17 | 102614.02 | 50.7738842157347848
group-c | 6013 | 99.99 | 0.02 | 300968.43 | 50.0529569266589057
gruop_d | 1 | 81.43 | 81.43 | 81.43 | 81.4300000000000000
(4 rows)
[ec2-user@ip-10-0-1-10 ivm]$
```
This may be because IVM internal information is not modified when the view name is renamed.
Hi,
I would like to implement Incremental View Maintenance (IVM) on PostgreSQL.
IVM is a technique to maintain materialized views which computes and applies
only the incremental changes to the materialized views rather than
recomputate the contents as the current REFRESH command does.
I had a presentation on our PoC implementation of IVM at PGConf.eu 2018 [1].
Our implementation uses row OIDs to compute deltas for materialized views.
The basic idea is that if we have information about which rows in base tables
are contributing to generate a certain row in a matview then we can identify
the affected rows when a base table is updated. This is based on an idea of
Dr. Masunaga [2] who is a member of our group and inspired from ID-based
approach[3].
In our implementation, the mapping of the row OIDs of the materialized view
and the base tables are stored in "OID map". When a base relation is modified,
AFTER trigger is executed and the delta is recorded in delta tables using
the transition table feature. The accual udpate of the matview is triggerd
by REFRESH command with INCREMENTALLY option.
However, we realize problems of our implementation. First, WITH OIDS will
be removed since PG12, so OIDs are no longer available. Besides this, it would
be hard to implement this since it needs many changes of executor nodes to
collect base tables's OIDs during execuing a query. Also, the cost of maintaining
OID map would be high.
For these reasons, we started to think to implement IVM without relying on OIDs
and made a bit more surveys.
We also looked at Kevin Grittner's discussion [4] on incremental matview
maintenance. In this discussion, Kevin proposed to use counting algorithm [5]
to handle projection views (using DISTNICT) properly. This algorithm need an
additional system column, count_t, in materialized views and delta tables of
base tables.
However, the discussion about IVM is now stoped, so we would like to restart and
progress this.
Through our PoC inplementation and surveys, I think we need to think at least
the followings for implementing IVM.
1. How to extract changes on base tables
I think there would be at least two approaches for it.
- Using transition table in AFTER triggers
- Extracting changes from WAL using logical decoding
In our PoC implementation, we used AFTER trigger and transition tables, but using
logical decoding might be better from the point of performance of base table
modification.
If we can represent a change of UPDATE on a base table as query-like rather than
OLD and NEW, it may be possible to update the materialized view directly instead
of performing delete & insert.
2. How to compute the delta to be applied to materialized views
Essentially, IVM is based on relational algebra. Theorically, changes on base
tables are represented as deltas on this, like "R <- R + dR", and the delta on
the materialized view is computed using base table deltas based on "change
propagation equations". For implementation, we have to derive the equation from
the view definition query (Query tree, or Plan tree?) and describe this as SQL
query to compulte delta to be applied to the materialized view.
There could be several operations for view definition: selection, projection,
join, aggregation, union, difference, intersection, etc. If we can prepare a
module for each operation, it makes IVM extensable, so we can start a simple
view definition, and then support more complex views.
3. How to identify rows to be modifed in materialized views
When applying the delta to the materialized view, we have to identify which row
in the matview is corresponding to a row in the delta. A naive method is matching
by using all columns in a tuple, but clearly this is unefficient. If thematerialized
view has unique index, we can use this. Maybe, we have to force materialized views
to have all primary key colums in their base tables. In our PoC implementation, we
used OID to identify rows, but this will be no longer available as said above.
4. When to maintain materialized views
There are two candidates of the timing of maintenance, immediate (eager) or deferred.
In eager maintenance, the materialized view is updated in the same transaction
where the base table is updated. In deferred maintenance, this is done after the
transaction is commited, for example, when view is accessed, as a response to user
request, etc.
In the previous discussion[4], it is planned to start from "eager" approach. In our PoC
implementaion, we used the other aproach, that is, using REFRESH command to perform IVM.
I am not sure which is better as a start point, but I begin to think that the eager
approach may be more simple since we don't have to maintain base table changes in other
past transactions.
In the eager maintenance approache, we have to consider a race condition where two
different transactions change base tables simultaneously as discussed in [4].
[1] https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/
[2] https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1 (Japanese only)
[3] https://dl.acm.org/citation.cfm?id=2750546
[4] https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com
[5] https://dl.acm.org/citation.cfm?id=170066
Regards,
--
Yugo Nagata <nagata@sraoss.co.jp>
On Mon, 2 Dec 2019 13:48:40 -0300
Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> On 2019-Dec-02, Yugo Nagata wrote:
>
> > On Mon, 02 Dec 2019 10:36:36 +0900 (JST)
> > Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
> >
> > > >> One thing pending in this development line is how to catalogue aggregate
> > > >> functions that can be used in incrementally-maintainable views.
> > > >> I saw a brief mention somewhere that the devels knew it needed to be
> > > >> done, but I don't see in the thread that they got around to doing it.
> > > >> Did you guys have any thoughts on how it can be represented in catalogs?
> > > >> It seems sine-qua-non ...
>
> > > > In the first option, we support only built-in aggregates which we know able
> > > > to handle correctly. Supported aggregates can be identified using their OIDs.
> > > > User-defined aggregates are not supported. I think this is the simplest and
> > > > easiest way.
> > >
> > > I think this is enough for the first cut of IVM. So +1.
> >
> > If there is no objection, I will add the check of aggregate functions
> > by this way. Thanks.
>
> The way I imagine things is that there's (one or more) new column in
> pg_aggregate that links to the operator(s) (or function(s)?) that
> support incremental update of the MV for that aggregate function. Is
> that what you're proposing?
The way I am proposing above is using OID to check if a aggregate can be
used in IVM. This allows only a part of built-in aggreagete functions.
This way you mentioned was proposed as one of options as following.
On Fri, 29 Nov 2019 17:33:28 +0900
Yugo Nagata <nagata@sraoss.co.jp> wrote:
> Third, we can add a new attribute to pg_aggregate which shows if each
> aggregate can be used in IVM. We don't need to use names or OIDs list of
> supported aggregates although we need modification of the system catalogue.
>
> Regarding pg_aggregate, now we have aggcombinefn attribute for supporting
> partial aggregation. Maybe we could use combine functions to calculate new
> aggregate values in IVM when tuples are inserted into a table. However, in
> the context of IVM, we also need other function used when tuples are deleted
> from a table, so we can not use partial aggregation for IVM in the current
> implementation. This might be another option to implement "inverse combine
> function"(?) for IVM, but I am not sure it worth.
If we add "inverse combine function" in pg_aggregate that takes two results
of aggregating over tuples in a view and tuples in a delta, and produces a
result of aggregating over tuples in the view after tuples in the delta are
deleted from this, it would allow to calculate new aggregate values in IVM
using aggcombinefn together when the aggregate function provides both
functions.
Another idea is to use support functions for moving-aggregate mode which are
already provided in pg_aggregate. However, in this case, we have to apply
tuples in the delta to the view one by one instead of applying after
aggregating tuples in the delta.
In both case, we can not use these support functions in SQL via SPI because
the type of some aggregates is internal. We have to alter the current
apply_delta implementation if we adopt a way using these support functions.
Instead, we also can add support functions for IVM independent to partial
aggregate or moving-aggregate. Maybe this is also one of options.
Regards,
Yugo Nagata
--
Yugo Nagata <nagata@sraoss.co.jp>
On Wed, 4 Dec 2019 21:18:02 +0900 nuko yokohama <nuko.yokohama@gmail.com> wrote: > Hi. > > I found the problem after running "ALTER MATERIALIZED VIEW ... RENAME TO". > If a view created with "CREATE INCREMENT MATERIALIZED VIEW" is renamed, > subsequent INSERT operations to the base table will fail. > > Error message. > ``` > ERROR: could not open relation with OID 0 Thank you for your pointing out this issue! This error occurs because the view's OID is retrieved using the view name. Considering that the name can be changed, this is obviously wrong. We'll fix it. Regards, Yugo Nagata > ``` > > Execution log. > ``` > [ec2-user@ip-10-0-1-10 ivm]$ psql -U postgres test -e -f > ~/test/ivm/alter_rename_bug.sql > DROP TABLE IF EXISTS table_x CASCADE; > psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:1: NOTICE: drop cascades > to materialized view group_imv > DROP TABLE > CREATE TABLE table_x AS > SELECT generate_series(1, 10000) AS id, > ROUND(random()::numeric * 100, 2) AS data, > CASE (random() * 5)::integer > WHEN 4 THEN 'group-a' > WHEN 3 THEN 'group-b' > ELSE 'group-c' > END AS part_key > ; > SELECT 10000 > Table "public.table_x" > Column | Type | Collation | Nullable | Default > ----------+---------+-----------+----------+--------- > id | integer | | | > data | numeric | | | > part_key | text | | | > > DROP MATERIALIZED VIEW IF EXISTS group_imv; > psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:15: NOTICE: materialized > view "group_imv" does not exist, skipping > DROP MATERIALIZED VIEW > CREATE INCREMENTAL MATERIALIZED VIEW group_imv AS > SELECT part_key, COUNT(*), MAX(data), MIN(data), SUM(data), AVG(data) > FROM table_x > GROUP BY part_key; > SELECT 3 > List of relations > Schema | Name | Type | Owner > --------+-----------+-------------------+---------- > public | group_imv | materialized view | postgres > public | table_x | table | postgres > (2 rows) > > Materialized view "public.group_imv" > Column | Type | Collation | Nullable | Default > -------------------+---------+-----------+----------+--------- > part_key | text | | | > count | bigint | | | > max | numeric | | | > min | numeric | | | > sum | numeric | | | > avg | numeric | | | > __ivm_count_max__ | bigint | | | > __ivm_count_min__ | bigint | | | > __ivm_count_sum__ | bigint | | | > __ivm_count_avg__ | bigint | | | > __ivm_sum_avg__ | numeric | | | > __ivm_count__ | bigint | | | > > SELECT * FROM group_imv ORDER BY part_key; > part_key | count | max | min | sum | avg > ----------+-------+-------+------+-----------+--------------------- > group-a | 1966 | 99.85 | 0.05 | 98634.93 | 50.1703611393692777 > group-b | 2021 | 99.99 | 0.17 | 102614.02 | 50.7738842157347848 > group-c | 6013 | 99.99 | 0.02 | 300968.43 | 50.0529569266589057 > (3 rows) > > ALTER MATERIALIZED VIEW group_imv RENAME TO group_imv2; > ALTER MATERIALIZED VIEW > List of relations > Schema | Name | Type | Owner > --------+------------+-------------------+---------- > public | group_imv2 | materialized view | postgres > public | table_x | table | postgres > (2 rows) > > Materialized view "public.group_imv2" > Column | Type | Collation | Nullable | Default > -------------------+---------+-----------+----------+--------- > part_key | text | | | > count | bigint | | | > max | numeric | | | > min | numeric | | | > sum | numeric | | | > avg | numeric | | | > __ivm_count_max__ | bigint | | | > __ivm_count_min__ | bigint | | | > __ivm_count_sum__ | bigint | | | > __ivm_count_avg__ | bigint | | | > __ivm_sum_avg__ | numeric | | | > __ivm_count__ | bigint | | | > > SET client_min_messages = debug5; > psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:30: DEBUG: > CommitTransaction(1) name: unnamed; blockState: STARTED; state: > INPROGRESS, xid/subid/cid: 0/1/0 > SET > INSERT INTO table_x VALUES (10000001, ROUND(random()::numeric * 100, 2), > 'gruop_d'); > psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:33: DEBUG: > StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, > xid/subid/cid: 0/1/0 > psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:33: DEBUG: relation > "public.group_imv" does not exist > psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:33: DEBUG: relation > "public.group_imv" does not exist > psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:33: ERROR: could not > open relation with OID 0 > RESET client_min_messages; > psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:34: DEBUG: > StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, > xid/subid/cid: 0/1/0 > RESET > SELECT * FROM group_imv2 ORDER BY part_key; > part_key | count | max | min | sum | avg > ----------+-------+-------+------+-----------+--------------------- > group-a | 1966 | 99.85 | 0.05 | 98634.93 | 50.1703611393692777 > group-b | 2021 | 99.99 | 0.17 | 102614.02 | 50.7738842157347848 > group-c | 6013 | 99.99 | 0.02 | 300968.43 | 50.0529569266589057 > (3 rows) > > ALTER MATERIALIZED VIEW group_imv2 RENAME TO group_imv; > ALTER MATERIALIZED VIEW > INSERT INTO table_x VALUES (10000001, ROUND(random()::numeric * 100, 2), > 'gruop_d'); > INSERT 0 1 > SELECT * FROM group_imv ORDER BY part_key; > part_key | count | max | min | sum | avg > ----------+-------+-------+-------+-----------+--------------------- > group-a | 1966 | 99.85 | 0.05 | 98634.93 | 50.1703611393692777 > group-b | 2021 | 99.99 | 0.17 | 102614.02 | 50.7738842157347848 > group-c | 6013 | 99.99 | 0.02 | 300968.43 | 50.0529569266589057 > gruop_d | 1 | 81.43 | 81.43 | 81.43 | 81.4300000000000000 > (4 rows) > > [ec2-user@ip-10-0-1-10 ivm]$ > ``` > > This may be because IVM internal information is not modified when the view > name is renamed. > > 2018年12月27日(木) 21:57 Yugo Nagata <nagata@sraoss.co.jp>: > > > Hi, > > > > I would like to implement Incremental View Maintenance (IVM) on > > PostgreSQL. > > IVM is a technique to maintain materialized views which computes and > > applies > > only the incremental changes to the materialized views rather than > > recomputate the contents as the current REFRESH command does. > > > > I had a presentation on our PoC implementation of IVM at PGConf.eu 2018 > > [1]. > > Our implementation uses row OIDs to compute deltas for materialized > > views. > > The basic idea is that if we have information about which rows in base > > tables > > are contributing to generate a certain row in a matview then we can > > identify > > the affected rows when a base table is updated. This is based on an idea of > > Dr. Masunaga [2] who is a member of our group and inspired from ID-based > > approach[3]. > > > > In our implementation, the mapping of the row OIDs of the materialized view > > and the base tables are stored in "OID map". When a base relation is > > modified, > > AFTER trigger is executed and the delta is recorded in delta tables using > > the transition table feature. The accual udpate of the matview is triggerd > > by REFRESH command with INCREMENTALLY option. > > > > However, we realize problems of our implementation. First, WITH OIDS will > > be removed since PG12, so OIDs are no longer available. Besides this, it > > would > > be hard to implement this since it needs many changes of executor nodes to > > collect base tables's OIDs during execuing a query. Also, the cost of > > maintaining > > OID map would be high. > > > > For these reasons, we started to think to implement IVM without relying on > > OIDs > > and made a bit more surveys. > > > > We also looked at Kevin Grittner's discussion [4] on incremental matview > > maintenance. In this discussion, Kevin proposed to use counting algorithm > > [5] > > to handle projection views (using DISTNICT) properly. This algorithm need > > an > > additional system column, count_t, in materialized views and delta tables > > of > > base tables. > > > > However, the discussion about IVM is now stoped, so we would like to > > restart and > > progress this. > > > > > > Through our PoC inplementation and surveys, I think we need to think at > > least > > the followings for implementing IVM. > > > > 1. How to extract changes on base tables > > > > I think there would be at least two approaches for it. > > > > - Using transition table in AFTER triggers > > - Extracting changes from WAL using logical decoding > > > > In our PoC implementation, we used AFTER trigger and transition tables, > > but using > > logical decoding might be better from the point of performance of base > > table > > modification. > > > > If we can represent a change of UPDATE on a base table as query-like > > rather than > > OLD and NEW, it may be possible to update the materialized view directly > > instead > > of performing delete & insert. > > > > > > 2. How to compute the delta to be applied to materialized views > > > > Essentially, IVM is based on relational algebra. Theorically, changes on > > base > > tables are represented as deltas on this, like "R <- R + dR", and the > > delta on > > the materialized view is computed using base table deltas based on "change > > propagation equations". For implementation, we have to derive the > > equation from > > the view definition query (Query tree, or Plan tree?) and describe this as > > SQL > > query to compulte delta to be applied to the materialized view. > > > > There could be several operations for view definition: selection, > > projection, > > join, aggregation, union, difference, intersection, etc. If we can > > prepare a > > module for each operation, it makes IVM extensable, so we can start a > > simple > > view definition, and then support more complex views. > > > > > > 3. How to identify rows to be modifed in materialized views > > > > When applying the delta to the materialized view, we have to identify > > which row > > in the matview is corresponding to a row in the delta. A naive method is > > matching > > by using all columns in a tuple, but clearly this is unefficient. If > > thematerialized > > view has unique index, we can use this. Maybe, we have to force > > materialized views > > to have all primary key colums in their base tables. In our PoC > > implementation, we > > used OID to identify rows, but this will be no longer available as said > > above. > > > > > > 4. When to maintain materialized views > > > > There are two candidates of the timing of maintenance, immediate (eager) > > or deferred. > > > > In eager maintenance, the materialized view is updated in the same > > transaction > > where the base table is updated. In deferred maintenance, this is done > > after the > > transaction is commited, for example, when view is accessed, as a response > > to user > > request, etc. > > > > In the previous discussion[4], it is planned to start from "eager" > > approach. In our PoC > > implementaion, we used the other aproach, that is, using REFRESH command > > to perform IVM. > > I am not sure which is better as a start point, but I begin to think that > > the eager > > approach may be more simple since we don't have to maintain base table > > changes in other > > past transactions. > > > > In the eager maintenance approache, we have to consider a race condition > > where two > > different transactions change base tables simultaneously as discussed in > > [4]. > > > > > > [1] > > https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/ > > [2] > > https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1 > > (Japanese only) > > [3] https://dl.acm.org/citation.cfm?id=2750546 > > [4] > > https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com > > [5] https://dl.acm.org/citation.cfm?id=170066 > > > > Regards, > > -- > > Yugo Nagata <nagata@sraoss.co.jp> > > > > -- Yugo Nagata <nagata@sraoss.co.jp>
Hi, Attached is the latest patch (v10) to add support for Incremental Materialized View Maintenance (IVM). IVM is a way to make materialized views up-to-date in which only incremental changes are computed and applied on views rather than recomputing the contents from scratch as REFRESH MATERIALIZED VIEW does. IVM can update materialized views more efficiently than recomputation when only small part of the view need updates. There are two approaches with regard to timing of view maintenance: immediate and deferred. In immediate maintenance, views are updated in the same transaction where its base table is modified. In deferred maintenance, views are updated after the transaction is committed, for example, when the view is accessed, as a response to user command like REFRESH, or periodically in background, and so on. This patch implements a kind of immediate maintenance, in which materialized views are updated immediately in AFTER triggers when a base table is modified. This supports views using: - inner and outer joins including self-join - some built-in aggregate functions (count, sum, agv, min, max) - a part of subqueries -- simple subqueries in FROM clause -- EXISTS subqueries in WHERE clause - DISTINCT and views with tuple duplicates === Here are major changes we made after the previous submitted patch: * Aggregate functions are checked if they can be used in IVM using their OID. Per comments from Alvaro Herrera. For this purpose, Gen_fmgrtab.pl was modified so that OIDs of aggregate functions are output to fmgroids.h. * Some bug fixes including: - Mistake of tab-completion of psql pointed out by nuko-san - A bug relating rename of matview pointed out by nuko-san - spelling errors - etc. * Add documentations for IVM * Patch is splited into eleven parts to make review easier as suggested by Amit Langote: - 0001: Add a new syntax: CREATE INCREMENTAL MATERIALIZED VIEW - 0002: Add a new column relisivm to pg_class - 0003: Change trigger.c to allow to prolong life span of tupestores containing Transition Tables generated via AFTER trigger - 0004: Add the basic IVM future using counting algorithm: This supports inner joins, DISTINCT, and tuple duplicates. - 0005: Change GEN_fmgrtab.pl to output aggregate function's OIDs - 0006: Add aggregates support for IVM - 0007: Add subqueries support for IVM - 0008: Add outer joins support for IVM - 0009: Add IVM support to psql command - 0010: Add regression tests for IVM - 0011: Add documentations for IVM === Todo: Currently, REFRESH and pg_dump/pg_restore is not supported, but we are working on them. Also, TRUNCATE is not supported. When TRUNCATE command is executed on a base table, nothing occurs on materialized views. We are now considering another better options, like: - Raise an error or warning when a base table is TRUNCATEed. - Make the view non-scannable (like WITH NO DATA) - Update the view in some ways. It would be easy for inner joins or aggregate views, but there is some difficult with outer joins. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
Attachment
Hello, I'm starting to take a closer look at this feature. I've just finished reading the discussion, excluding other referencedmaterials. The following IVM wiki page returns an error. Does anybody know what's wrong? https://wiki.postgresql.org/wiki/Incremental_View_Maintenance [screen] ---------- MediaWiki internal error. Exception caught inside exception handler. Set $wgShowExceptionDetails = true; at the bottom of LocalSettings.php to show detailed debugging information. ---------- Could you give some concrete use cases, so that I can have a clearer image of the target data? In the discussion, someonereferred to master data with low update frequency, because the proposed IVM implementation adds triggers on sourcetables, which limits the applicability to update-heavy tables. Regards Takayuki Tsunakawa
> I'm starting to take a closer look at this feature. I've just finished reading the discussion, excluding other referencedmaterials. Thank you! > The following IVM wiki page returns an error. Does anybody know what's wrong? > > https://wiki.postgresql.org/wiki/Incremental_View_Maintenance I don't have any problem with the page. Maybe temporary error? Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
An expression SELECT statement that is not IMMUTABLE must not be specified when creating a view.
In the current implementation, a SELECT statement containing an expression that is not IMMUTABLE can be specified when creating a view.
If an incremental materialized view is created from a SELECT statement that contains an expression that is not IMMUTABLE, applying the SELECT statement to the view returns incorrect results.
To prevent this, we propose that the same error occur when a non-IMMUTABLE expression is specified in the "CREATE INDEX" statement.
The following is an inappropriate example.
CREATE TABLE
CREATE VIEW base_v AS SELECT * FROM base
WHERE ts >= (now() - '3 second'::interval);
CREATE VIEW
CREATE MATERIALIZED VIEW base_mv AS SELECT * FROM base
WHERE ts >= (now() - '3 second'::interval);
SELECT 0
CREATE INCREMENTAL MATERIALIZED VIEW base_imv AS SELECT * FROM base
WHERE ts >= (now() - '3 second'::interval);
SELECT 0
View "public.base_v"
Column | Type | Collation | Nullable | Default | Storage | Description
--------+-----------------------------+-----------+----------+---------+----------+-------------
id | integer | | | | plain |
data | text | | | | extended |
ts | timestamp without time zone | | | | plain |
View definition:
SELECT base.id,
base.data,
base.ts
FROM base
WHERE base.ts >= (now() - '00:00:03'::interval);
Materialized view "public.base_mv"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
data | text | | | | extended | |
ts | timestamp without time zone | | | | plain | |
View definition:
SELECT base.id,
base.data,
base.ts
FROM base
WHERE base.ts >= (now() - '00:00:03'::interval);
Access method: heap
Materialized view "public.base_imv"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
---------------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
data | text | | | | extended | |
ts | timestamp without time zone | | | | plain | |
__ivm_count__ | bigint | | | | plain | |
View definition:
SELECT base.id,
base.data,
base.ts
FROM base
WHERE base.ts >= (now() - '00:00:03'::interval);
Access method: heap
Incremental view maintenance: yes
INSERT INTO base VALUES (generate_series(1,3), 'dummy', clock_timestamp());
INSERT 0 3
SELECT * FROM base_v ORDER BY id;
id | data | ts
----+-------+----------------------------
1 | dummy | 2019-12-22 11:38:26.367481
2 | dummy | 2019-12-22 11:38:26.367599
3 | dummy | 2019-12-22 11:38:26.367606
(3 rows)
SELECT * FROM base_mv ORDER BY id;
id | data | ts
----+------+----
(0 rows)
REFRESH MATERIALIZED VIEW base_mv;
REFRESH MATERIALIZED VIEW
SELECT * FROM base_mv ORDER BY id;
id | data | ts
----+-------+----------------------------
1 | dummy | 2019-12-22 11:38:26.367481
2 | dummy | 2019-12-22 11:38:26.367599
3 | dummy | 2019-12-22 11:38:26.367606
(3 rows)
SELECT * FROM base_imv ORDER BY id;
id | data | ts
----+-------+----------------------------
1 | dummy | 2019-12-22 11:38:26.367481
2 | dummy | 2019-12-22 11:38:26.367599
3 | dummy | 2019-12-22 11:38:26.367606
(3 rows)
SELECT pg_sleep(3);
pg_sleep
----------
(1 row)
INSERT INTO base VALUES (generate_series(4,6), 'dummy', clock_timestamp());
INSERT 0 3
SELECT * FROM base_v ORDER BY id;
id | data | ts
----+-------+----------------------------
4 | dummy | 2019-12-22 11:38:29.381414
5 | dummy | 2019-12-22 11:38:29.381441
6 | dummy | 2019-12-22 11:38:29.381444
(3 rows)
SELECT * FROM base_mv ORDER BY id;
id | data | ts
----+-------+----------------------------
1 | dummy | 2019-12-22 11:38:26.367481
2 | dummy | 2019-12-22 11:38:26.367599
3 | dummy | 2019-12-22 11:38:26.367606
(3 rows)
REFRESH MATERIALIZED VIEW base_mv;
REFRESH MATERIALIZED VIEW
SELECT * FROM base_mv ORDER BY id;
id | data | ts
----+-------+----------------------------
4 | dummy | 2019-12-22 11:38:29.381414
5 | dummy | 2019-12-22 11:38:29.381441
6 | dummy | 2019-12-22 11:38:29.381444
(3 rows)
SELECT * FROM base_imv ORDER BY id;
id | data | ts
----+-------+----------------------------
1 | dummy | 2019-12-22 11:38:26.367481
2 | dummy | 2019-12-22 11:38:26.367599
3 | dummy | 2019-12-22 11:38:26.367606
4 | dummy | 2019-12-22 11:38:29.381414
5 | dummy | 2019-12-22 11:38:29.381441
6 | dummy | 2019-12-22 11:38:29.381444
(6 rows)
REFRESH MATERIALIZED VIEW base_mv;
REFRESH MATERIALIZED VIEW
SELECT * FROM base_imv ORDER BY id;
id | data | ts
----+-------+----------------------------
1 | dummy | 2019-12-22 11:38:26.367481
2 | dummy | 2019-12-22 11:38:26.367599
3 | dummy | 2019-12-22 11:38:26.367606
4 | dummy | 2019-12-22 11:38:29.381414
5 | dummy | 2019-12-22 11:38:29.381441
6 | dummy | 2019-12-22 11:38:29.381444
(6 rows)
Hi,
I would like to implement Incremental View Maintenance (IVM) on PostgreSQL.
IVM is a technique to maintain materialized views which computes and applies
only the incremental changes to the materialized views rather than
recomputate the contents as the current REFRESH command does.
I had a presentation on our PoC implementation of IVM at PGConf.eu 2018 [1].
Our implementation uses row OIDs to compute deltas for materialized views.
The basic idea is that if we have information about which rows in base tables
are contributing to generate a certain row in a matview then we can identify
the affected rows when a base table is updated. This is based on an idea of
Dr. Masunaga [2] who is a member of our group and inspired from ID-based
approach[3].
In our implementation, the mapping of the row OIDs of the materialized view
and the base tables are stored in "OID map". When a base relation is modified,
AFTER trigger is executed and the delta is recorded in delta tables using
the transition table feature. The accual udpate of the matview is triggerd
by REFRESH command with INCREMENTALLY option.
However, we realize problems of our implementation. First, WITH OIDS will
be removed since PG12, so OIDs are no longer available. Besides this, it would
be hard to implement this since it needs many changes of executor nodes to
collect base tables's OIDs during execuing a query. Also, the cost of maintaining
OID map would be high.
For these reasons, we started to think to implement IVM without relying on OIDs
and made a bit more surveys.
We also looked at Kevin Grittner's discussion [4] on incremental matview
maintenance. In this discussion, Kevin proposed to use counting algorithm [5]
to handle projection views (using DISTNICT) properly. This algorithm need an
additional system column, count_t, in materialized views and delta tables of
base tables.
However, the discussion about IVM is now stoped, so we would like to restart and
progress this.
Through our PoC inplementation and surveys, I think we need to think at least
the followings for implementing IVM.
1. How to extract changes on base tables
I think there would be at least two approaches for it.
- Using transition table in AFTER triggers
- Extracting changes from WAL using logical decoding
In our PoC implementation, we used AFTER trigger and transition tables, but using
logical decoding might be better from the point of performance of base table
modification.
If we can represent a change of UPDATE on a base table as query-like rather than
OLD and NEW, it may be possible to update the materialized view directly instead
of performing delete & insert.
2. How to compute the delta to be applied to materialized views
Essentially, IVM is based on relational algebra. Theorically, changes on base
tables are represented as deltas on this, like "R <- R + dR", and the delta on
the materialized view is computed using base table deltas based on "change
propagation equations". For implementation, we have to derive the equation from
the view definition query (Query tree, or Plan tree?) and describe this as SQL
query to compulte delta to be applied to the materialized view.
There could be several operations for view definition: selection, projection,
join, aggregation, union, difference, intersection, etc. If we can prepare a
module for each operation, it makes IVM extensable, so we can start a simple
view definition, and then support more complex views.
3. How to identify rows to be modifed in materialized views
When applying the delta to the materialized view, we have to identify which row
in the matview is corresponding to a row in the delta. A naive method is matching
by using all columns in a tuple, but clearly this is unefficient. If thematerialized
view has unique index, we can use this. Maybe, we have to force materialized views
to have all primary key colums in their base tables. In our PoC implementation, we
used OID to identify rows, but this will be no longer available as said above.
4. When to maintain materialized views
There are two candidates of the timing of maintenance, immediate (eager) or deferred.
In eager maintenance, the materialized view is updated in the same transaction
where the base table is updated. In deferred maintenance, this is done after the
transaction is commited, for example, when view is accessed, as a response to user
request, etc.
In the previous discussion[4], it is planned to start from "eager" approach. In our PoC
implementaion, we used the other aproach, that is, using REFRESH command to perform IVM.
I am not sure which is better as a start point, but I begin to think that the eager
approach may be more simple since we don't have to maintain base table changes in other
past transactions.
In the eager maintenance approache, we have to consider a race condition where two
different transactions change base tables simultaneously as discussed in [4].
[1] https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/
[2] https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1 (Japanese only)
[3] https://dl.acm.org/citation.cfm?id=2750546
[4] https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com
[5] https://dl.acm.org/citation.cfm?id=170066
Regards,
--
Yugo Nagata <nagata@sraoss.co.jp>
Hello, First of all many thanks for this Great feature replacing so many triggers by a so simple syntax ;o) I was wondering about performances and add a look at pg_stat_statements (with track=all) with IVM_v9.patch. For each insert into a base table there are 3 statements: - ANALYZE pg_temp_3.pg_temp_81976 - WITH updt AS ( UPDATE public.mv1 AS mv SET __ivm_count__ = ... - DROP TABLE pg_temp_3.pg_temp_81976 It generates a lot of lines in pg_stat_statements with calls = 1. Thoses statements can not be shared because the temp table is dropped each time. Is there a plan to change this ? Many Thanks again Regards PAscal -- Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
From: Tatsuo Ishii <ishii@sraoss.co.jp> > > The following IVM wiki page returns an error. Does anybody know what's > wrong? > > > > https://wiki.postgresql.org/wiki/Incremental_View_Maintenance > > I don't have any problem with the page. Maybe temporary error? Yeah, I can see it now. I could see it on the weekend. The page was not available for at least an hour or so when I askedabout this. I thought the Pgsql-www team kindly solved the issue. Regards Takayuki Tsunakawa
On Sun, 22 Dec 2019 20:54:41 +0900 nuko yokohama <nuko.yokohama@gmail.com> wrote: > SELECT statement that is not IMMUTABLE must not be specified when creating > a view. > > An expression SELECT statement that is not IMMUTABLE must not be specified > when creating a view. > > In the current implementation, a SELECT statement containing an expression > that is not IMMUTABLE can be specified when creating a view. > If an incremental materialized view is created from a SELECT statement that > contains an expression that is not IMMUTABLE, applying the SELECT statement > to the view returns incorrect results. > To prevent this, we propose that the same error occur when a non-IMMUTABLE > expression is specified in the "CREATE INDEX" statement. Thank you for pointing out this. That makes sense. The check of not-IMMUTABLE epressions is missing at creating IMMV. We'll add this. Thanks, Yugo Nagata > > The following is an inappropriate example. > ---- > CREATE TABLE base (id int primary key, data text, ts timestamp); > CREATE TABLE > CREATE VIEW base_v AS SELECT * FROM base > WHERE ts >= (now() - '3 second'::interval); > CREATE VIEW > CREATE MATERIALIZED VIEW base_mv AS SELECT * FROM base > WHERE ts >= (now() - '3 second'::interval); > SELECT 0 > CREATE INCREMENTAL MATERIALIZED VIEW base_imv AS SELECT * FROM base > WHERE ts >= (now() - '3 second'::interval); > SELECT 0 > View "public.base_v" > Column | Type | Collation | Nullable | Default | > Storage | Description > --------+-----------------------------+-----------+----------+---------+----------+------------- > id | integer | | | | > plain | > data | text | | | | > extended | > ts | timestamp without time zone | | | | > plain | > View definition: > SELECT base.id, > base.data, > base.ts > FROM base > WHERE base.ts >= (now() - '00:00:03'::interval); > > Materialized view "public.base_mv" > Column | Type | Collation | Nullable | Default | > Storage | Stats target | Description > --------+-----------------------------+-----------+----------+---------+----------+--------------+------------- > id | integer | | | | > plain | | > data | text | | | | > extended | | > ts | timestamp without time zone | | | | > plain | | > View definition: > SELECT base.id, > base.data, > base.ts > FROM base > WHERE base.ts >= (now() - '00:00:03'::interval); > Access method: heap > > Materialized view "public.base_imv" > Column | Type | Collation | Nullable | > Default | Storage | Stats target | Description > ---------------+-----------------------------+-----------+----------+---------+----------+--------------+------------- > id | integer | | | > | plain | | > data | text | | | > | extended | | > ts | timestamp without time zone | | | > | plain | | > __ivm_count__ | bigint | | | > | plain | | > View definition: > SELECT base.id, > base.data, > base.ts > FROM base > WHERE base.ts >= (now() - '00:00:03'::interval); > Access method: heap > Incremental view maintenance: yes > > INSERT INTO base VALUES (generate_series(1,3), 'dummy', clock_timestamp()); > INSERT 0 3 > SELECT * FROM base_v ORDER BY id; > id | data | ts > ----+-------+---------------------------- > 1 | dummy | 2019-12-22 11:38:26.367481 > 2 | dummy | 2019-12-22 11:38:26.367599 > 3 | dummy | 2019-12-22 11:38:26.367606 > (3 rows) > > SELECT * FROM base_mv ORDER BY id; > id | data | ts > ----+------+---- > (0 rows) > > REFRESH MATERIALIZED VIEW base_mv; > REFRESH MATERIALIZED VIEW > SELECT * FROM base_mv ORDER BY id; > id | data | ts > ----+-------+---------------------------- > 1 | dummy | 2019-12-22 11:38:26.367481 > 2 | dummy | 2019-12-22 11:38:26.367599 > 3 | dummy | 2019-12-22 11:38:26.367606 > (3 rows) > > SELECT * FROM base_imv ORDER BY id; > id | data | ts > ----+-------+---------------------------- > 1 | dummy | 2019-12-22 11:38:26.367481 > 2 | dummy | 2019-12-22 11:38:26.367599 > 3 | dummy | 2019-12-22 11:38:26.367606 > (3 rows) > > SELECT pg_sleep(3); > pg_sleep > ---------- > > (1 row) > > INSERT INTO base VALUES (generate_series(4,6), 'dummy', clock_timestamp()); > INSERT 0 3 > SELECT * FROM base_v ORDER BY id; > id | data | ts > ----+-------+---------------------------- > 4 | dummy | 2019-12-22 11:38:29.381414 > 5 | dummy | 2019-12-22 11:38:29.381441 > 6 | dummy | 2019-12-22 11:38:29.381444 > (3 rows) > > SELECT * FROM base_mv ORDER BY id; > id | data | ts > ----+-------+---------------------------- > 1 | dummy | 2019-12-22 11:38:26.367481 > 2 | dummy | 2019-12-22 11:38:26.367599 > 3 | dummy | 2019-12-22 11:38:26.367606 > (3 rows) > > REFRESH MATERIALIZED VIEW base_mv; > REFRESH MATERIALIZED VIEW > SELECT * FROM base_mv ORDER BY id; > id | data | ts > ----+-------+---------------------------- > 4 | dummy | 2019-12-22 11:38:29.381414 > 5 | dummy | 2019-12-22 11:38:29.381441 > 6 | dummy | 2019-12-22 11:38:29.381444 > (3 rows) > > SELECT * FROM base_imv ORDER BY id; > id | data | ts > ----+-------+---------------------------- > 1 | dummy | 2019-12-22 11:38:26.367481 > 2 | dummy | 2019-12-22 11:38:26.367599 > 3 | dummy | 2019-12-22 11:38:26.367606 > 4 | dummy | 2019-12-22 11:38:29.381414 > 5 | dummy | 2019-12-22 11:38:29.381441 > 6 | dummy | 2019-12-22 11:38:29.381444 > (6 rows) > > REFRESH MATERIALIZED VIEW base_mv; > REFRESH MATERIALIZED VIEW > SELECT * FROM base_imv ORDER BY id; > id | data | ts > ----+-------+---------------------------- > 1 | dummy | 2019-12-22 11:38:26.367481 > 2 | dummy | 2019-12-22 11:38:26.367599 > 3 | dummy | 2019-12-22 11:38:26.367606 > 4 | dummy | 2019-12-22 11:38:29.381414 > 5 | dummy | 2019-12-22 11:38:29.381441 > 6 | dummy | 2019-12-22 11:38:29.381444 > (6 rows) > ---- > > 2018年12月27日(木) 21:57 Yugo Nagata <nagata@sraoss.co.jp>: > > > Hi, > > > > I would like to implement Incremental View Maintenance (IVM) on > > PostgreSQL. > > IVM is a technique to maintain materialized views which computes and > > applies > > only the incremental changes to the materialized views rather than > > recomputate the contents as the current REFRESH command does. > > > > I had a presentation on our PoC implementation of IVM at PGConf.eu 2018 > > [1]. > > Our implementation uses row OIDs to compute deltas for materialized > > views. > > The basic idea is that if we have information about which rows in base > > tables > > are contributing to generate a certain row in a matview then we can > > identify > > the affected rows when a base table is updated. This is based on an idea of > > Dr. Masunaga [2] who is a member of our group and inspired from ID-based > > approach[3]. > > > > In our implementation, the mapping of the row OIDs of the materialized view > > and the base tables are stored in "OID map". When a base relation is > > modified, > > AFTER trigger is executed and the delta is recorded in delta tables using > > the transition table feature. The accual udpate of the matview is triggerd > > by REFRESH command with INCREMENTALLY option. > > > > However, we realize problems of our implementation. First, WITH OIDS will > > be removed since PG12, so OIDs are no longer available. Besides this, it > > would > > be hard to implement this since it needs many changes of executor nodes to > > collect base tables's OIDs during execuing a query. Also, the cost of > > maintaining > > OID map would be high. > > > > For these reasons, we started to think to implement IVM without relying on > > OIDs > > and made a bit more surveys. > > > > We also looked at Kevin Grittner's discussion [4] on incremental matview > > maintenance. In this discussion, Kevin proposed to use counting algorithm > > [5] > > to handle projection views (using DISTNICT) properly. This algorithm need > > an > > additional system column, count_t, in materialized views and delta tables > > of > > base tables. > > > > However, the discussion about IVM is now stoped, so we would like to > > restart and > > progress this. > > > > > > Through our PoC inplementation and surveys, I think we need to think at > > least > > the followings for implementing IVM. > > > > 1. How to extract changes on base tables > > > > I think there would be at least two approaches for it. > > > > - Using transition table in AFTER triggers > > - Extracting changes from WAL using logical decoding > > > > In our PoC implementation, we used AFTER trigger and transition tables, > > but using > > logical decoding might be better from the point of performance of base > > table > > modification. > > > > If we can represent a change of UPDATE on a base table as query-like > > rather than > > OLD and NEW, it may be possible to update the materialized view directly > > instead > > of performing delete & insert. > > > > > > 2. How to compute the delta to be applied to materialized views > > > > Essentially, IVM is based on relational algebra. Theorically, changes on > > base > > tables are represented as deltas on this, like "R <- R + dR", and the > > delta on > > the materialized view is computed using base table deltas based on "change > > propagation equations". For implementation, we have to derive the > > equation from > > the view definition query (Query tree, or Plan tree?) and describe this as > > SQL > > query to compulte delta to be applied to the materialized view. > > > > There could be several operations for view definition: selection, > > projection, > > join, aggregation, union, difference, intersection, etc. If we can > > prepare a > > module for each operation, it makes IVM extensable, so we can start a > > simple > > view definition, and then support more complex views. > > > > > > 3. How to identify rows to be modifed in materialized views > > > > When applying the delta to the materialized view, we have to identify > > which row > > in the matview is corresponding to a row in the delta. A naive method is > > matching > > by using all columns in a tuple, but clearly this is unefficient. If > > thematerialized > > view has unique index, we can use this. Maybe, we have to force > > materialized views > > to have all primary key colums in their base tables. In our PoC > > implementation, we > > used OID to identify rows, but this will be no longer available as said > > above. > > > > > > 4. When to maintain materialized views > > > > There are two candidates of the timing of maintenance, immediate (eager) > > or deferred. > > > > In eager maintenance, the materialized view is updated in the same > > transaction > > where the base table is updated. In deferred maintenance, this is done > > after the > > transaction is commited, for example, when view is accessed, as a response > > to user > > request, etc. > > > > In the previous discussion[4], it is planned to start from "eager" > > approach. In our PoC > > implementaion, we used the other aproach, that is, using REFRESH command > > to perform IVM. > > I am not sure which is better as a start point, but I begin to think that > > the eager > > approach may be more simple since we don't have to maintain base table > > changes in other > > past transactions. > > > > In the eager maintenance approache, we have to consider a race condition > > where two > > different transactions change base tables simultaneously as discussed in > > [4]. > > > > > > [1] > > https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/ > > [2] > > https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1 > > (Japanese only) > > [3] https://dl.acm.org/citation.cfm?id=2750546 > > [4] > > https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com > > [5] https://dl.acm.org/citation.cfm?id=170066 > > > > Regards, > > -- > > Yugo Nagata <nagata@sraoss.co.jp> > > > > -- Yugo Nagata <nagata@sraoss.co.jp>
> Could you give some concrete use cases, so that I can have a clearer image of the target data? In the discussion, someonereferred to master data with low update frequency, because the proposed IVM implementation adds triggers on sourcetables, which limits the applicability to update-heavy tables. But if you want to get always up-to-data you need to pay the cost for REFRESH MATERIALIZED VIEW. IVM gives a choice here. pgbench -s 100 create materialized view mv1 as select count(*) from pgbench_accounts; create incremental materialized view mv2 as select count(*) from pgbench_accounts; Now I delete one row from pgbench_accounts. test=# delete from pgbench_accounts where aid = 10000000; DELETE 1 Time: 12.387 ms Of course this makes mv1's data obsolete: test=# select * from mv1; count ---------- 10000000 (1 row) To reflect the fact on mv1 that a row was deleted from pgbench_accounts, you need to refresh mv1: test=# refresh materialized view mv1; REFRESH MATERIALIZED VIEW Time: 788.757 ms which takes 788ms. With mv2 you don't need to pay this cost to get the latest data. This is kind of ideal use case for IVM and I do not claim that IVM always wins over ordinary materialized view (or non materialized view). IVM will give benefit in that a materialized view instantly updated whenever base tables get updated with a cost of longer update time on base tables. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
From: legrand legrand <legrand_legrand@hotmail.com> > For each insert into a base table there are 3 statements: > - ANALYZE pg_temp_3.pg_temp_81976 > - WITH updt AS ( UPDATE public.mv1 AS mv SET __ivm_count__ = ... > - DROP TABLE pg_temp_3.pg_temp_81976 Does it also include CREATE TEMPORARY TABLE, because there's DROP? I remember that repeated CREATE and DROP of temporary tables should be avoided in PostgreSQL. Dropped temporary tables leavesome unused memory in CacheMemoryContext. If creation and deletion of temporary tables are done per row in a singlesession, say loading of large amount of data, memory bloat could crash the OS. That actually happened at a user'senvironment. Plus, repeated create/drop may cause system catalog bloat as well even when they are performed in different sessions. Ina fortunate case, the garbage records gather at the end of the system tables, and autovacuum will free those empty areasby truncating data files. However, if some valid entry persists after the long garbage area, the system tables wouldremain bloated. What kind of workload and data are you targeting with IVM? Regards Takayuki Tsunakawa
On Mon, 23 Dec 2019 02:26:09 +0000 "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote: > From: legrand legrand <legrand_legrand@hotmail.com> > > For each insert into a base table there are 3 statements: > > - ANALYZE pg_temp_3.pg_temp_81976 > > - WITH updt AS ( UPDATE public.mv1 AS mv SET __ivm_count__ = ... > > - DROP TABLE pg_temp_3.pg_temp_81976 > > Does it also include CREATE TEMPORARY TABLE, because there's DROP? CREATE TEMPRARY TABLE is not called because temptables are created by make_new_heap() instead of queries via SPI. > I remember that repeated CREATE and DROP of temporary tables should be avoided in PostgreSQL. Dropped temporary tablesleave some unused memory in CacheMemoryContext. If creation and deletion of temporary tables are done per row in asingle session, say loading of large amount of data, memory bloat could crash the OS. That actually happened at a user'senvironment. > Plus, repeated create/drop may cause system catalog bloat as well even when they are performed in different sessions. In a fortunate case, the garbage records gather at the end of the system tables, and autovacuum will free those empty areasby truncating data files. However, if some valid entry persists after the long garbage area, the system tables wouldremain bloated. Thank you for explaining the problem. I understood that creating and dropping temprary tables is harmful more than I have thought. Although this is not a concrete plan, there are two ideas to reduce creating temporary tables: 1. Create a temporary table only once at the first view maintenance in this session. This is possible if we store names or oid of temporary tables used for each materialized view in memory. However, users may access to these temptables whenever during the session. 2. Use tuplestores instead of temprary tables. Tuplestores can be converted to Ephemeral Name Relation (ENR) and used in queries. It doesn't need updating system catalogs, but indexes can not be used to access. > > What kind of workload and data are you targeting with IVM? IVM (with immediate maintenance approach) would be efficient in situations where modifications on base tables are not frequent. In such situations, create and drop of temptalbes is not so frequent either, but it would be still possible that the problem you concern occurs. So, it seems worth to consider the way to reduce use of temptable. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
On Mon, Dec 23, 2019 at 7:51 AM Yugo Nagata <nagata@sraoss.co.jp> wrote: > > On Mon, 23 Dec 2019 02:26:09 +0000 > "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote: > > > From: legrand legrand <legrand_legrand@hotmail.com> > > > For each insert into a base table there are 3 statements: > > > - ANALYZE pg_temp_3.pg_temp_81976 > > > - WITH updt AS ( UPDATE public.mv1 AS mv SET __ivm_count__ = ... > > > - DROP TABLE pg_temp_3.pg_temp_81976 > > > > Does it also include CREATE TEMPORARY TABLE, because there's DROP? > > CREATE TEMPRARY TABLE is not called because temptables are created > by make_new_heap() instead of queries via SPI. > > > I remember that repeated CREATE and DROP of temporary tables should be avoided in PostgreSQL. Dropped temporary tablesleave some unused memory in CacheMemoryContext. If creation and deletion of temporary tables are done per row in asingle session, say loading of large amount of data, memory bloat could crash the OS. That actually happened at a user'senvironment. > > > Plus, repeated create/drop may cause system catalog bloat as well even when they are performed in different sessions. In a fortunate case, the garbage records gather at the end of the system tables, and autovacuum will free thoseempty areas by truncating data files. However, if some valid entry persists after the long garbage area, the systemtables would remain bloated. > > Thank you for explaining the problem. I understood that creating and > dropping temprary tables is harmful more than I have thought. Although > this is not a concrete plan, there are two ideas to reduce creating > temporary tables: For the pg_stat_statements point of view, utility command support is already quite bad as with many workloads it's rather impossible to activate track_utility as it'd otherwise pollute the hashtable with an infinity of queries executed only once (random prepared transaction name, random cursor names...). I'm wondering whether we should normalize utility statements deparsing the utilityStmt, and also normalizing some identifiers (maybe optionally with a GUC), eg. "DECLARE ? AS CURSOR FOR normalized_query_here". However commands like vacuum or drop would be better kept as-is.
From: Tatsuo Ishii <ishii@sraoss.co.jp> > the target data? In the discussion, someone referred to master data with low > update frequency, because the proposed IVM implementation adds triggers on > source tables, which limits the applicability to update-heavy tables. > > But if you want to get always up-to-data you need to pay the cost for > REFRESH MATERIALIZED VIEW. IVM gives a choice here. Thank you, that clarified to some extent. What kind of data do you think of as an example? Materialized view reminds me of the use in a data warehouse. Oracle handles the top in its Database Data Warehousing Guide,and Microsoft has just started to offer the materialized view feature in its Azure Synapse Analytics (formerly SQLData Warehouse). AWS also has previewed Redshift's materialized view feature in re:Invent 2019. Are you targeting thedata warehouse (analytics) workload? IIUC, to put (over) simply, the data warehouse has two kind of tables: * Facts (transaction data): e.g. sales, user activity Large amount. INSERT only on a regular basis (ETL/ELT) or continuously (streaming) * Dimensions (master/reference data): e.g. product, customer, time, country Small amount. Infrequently INSERTed or UPDATEd. The proposed trigger-based approach does not seem to be suitable for the facts, because the trigger overhead imposed on dataloading may offset or exceed the time saved by incrementally refreshing the materialized views. Then, does the proposed feature fit the dimension tables? If the materialized view is only based on the dimension data,then the full REFRESH of the materialized view wouldn't take so long. The typical materialized view should join thefact and dimension tables. Then, the fact table will have to have the triggers, causing the data loading slowdown. I'm saying this because I'm concerned about the trigger based overhead. As you know, Oracle uses materialized view logsto save changes and incrementally apply them later to the materialized views (REFRESH ON STATEMENT materialized viewsdoesn't require the materialized view log, so it might use triggers.) Does any commercial grade database implementmaterialized view using triggers? I couldn't find relevant information regarding Azure Synapse and Redshift. If our only handy option is a trigger, can we minimize the overhead by doing the view maintenance at transaction commit? Regards Takayuki Tsunakawa
From: Yugo Nagata <nagata@sraoss.co.jp> > 1. Create a temporary table only once at the first view maintenance in > this session. This is possible if we store names or oid of temporary > tables used for each materialized view in memory. However, users may > access to these temptables whenever during the session. > > 2. Use tuplestores instead of temprary tables. Tuplestores can be > converted to Ephemeral Name Relation (ENR) and used in queries. > It doesn't need updating system catalogs, but indexes can not be > used to access. How about unlogged tables ? I thought the point of using a temp table is to avoid WAL overhead. One concern about the temp table is that it precludes the use of distributed transactions (PREPARE TRANSACTION fails if thetransaction accessed a temp table.) This could become a headache when FDW has supported 2PC (which Sawada-san startedand Horicuchi-san has taken over.) In the near future, PostgreSQL may evolve into a shared nothing database withdistributed transactions like Postgres-XL. Regards Takayuki Tsunakawa
>> But if you want to get always up-to-data you need to pay the cost for >> REFRESH MATERIALIZED VIEW. IVM gives a choice here. > > Thank you, that clarified to some extent. What kind of data do you think of as an example? > > Materialized view reminds me of the use in a data warehouse. Oracle handles the top in its Database Data Warehousing Guide,and Microsoft has just started to offer the materialized view feature in its Azure Synapse Analytics (formerly SQLData Warehouse). AWS also has previewed Redshift's materialized view feature in re:Invent 2019. Are you targeting thedata warehouse (analytics) workload? First of all, we do not think that current approach is the final one. Instead we want to implement IVM feature one by one: i.e. we start with "immediate update" approach, because it's simple and easier to implement. Then we will add "deferred update" mode later on. In fact Oracle has both "immediate update" and "deferred update" mode of IVM (actually there are more "mode" with their implementation). I recommend you to look into Oracle's materialized view feature closely. For fair evaluation, probably we should compare the IVM patch with Oracle's "immediate update" (they call it "on statement") mode. > IIUC, to put (over) simply, the data warehouse has two kind of tables: Probably deferred IVM mode is more suitable for DWH. However as I said earlier, we hope to implement the immediate mode first then add the deferred mode. Let's start with simple one then add more features. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On Mon, 23 Dec 2019 08:08:53 +0000 "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote: > From: Yugo Nagata <nagata@sraoss.co.jp> > > 1. Create a temporary table only once at the first view maintenance in > > this session. This is possible if we store names or oid of temporary > > tables used for each materialized view in memory. However, users may > > access to these temptables whenever during the session. > > > > 2. Use tuplestores instead of temprary tables. Tuplestores can be > > converted to Ephemeral Name Relation (ENR) and used in queries. > > It doesn't need updating system catalogs, but indexes can not be > > used to access. > > How about unlogged tables ? I thought the point of using a temp table is to avoid WAL overhead. Hmm... this might be another option. However, if we use unlogged tables, we will need to create them in a special schema similar to pg_toast to split this from user tables. Otherwise, we need to create and drop unlogged tables repeatedly for each session. > > One concern about the temp table is that it precludes the use of distributed transactions (PREPARE TRANSACTION fails ifthe transaction accessed a temp table.) This could become a headache when FDW has supported 2PC (which Sawada-san startedand Horicuchi-san has taken over.) In the near future, PostgreSQL may evolve into a shared nothing database withdistributed transactions like Postgres-XL. This makes sense since you mean that PREPARE TRANSACTION can not be used if any base table of incrementally maintainable materialized views is modified in the transaction, at least in the immediate maintenance. Maybe, this issue can be resolved if we implement the deferred maintenance planned in future because materialized views can be updated in other transactions in this way. > > > Regards > Takayuki Tsunakawa > > > -- Yugo Nagata <nagata@sraoss.co.jp>
Hello, regarding my initial post: > For each insert into a base table there are 3 statements: > - ANALYZE pg_temp_3.pg_temp_81976 > - WITH updt AS ( UPDATE public.mv1 AS mv SET __ivm_count__ = ... > - DROP TABLE pg_temp_3.pg_temp_81976 For me there where 3 points to discuss: - create/drop tables may bloat dictionnary tables - create/drop tables prevents "WITH updt ..." from being shared (with some plan caching) - generates many lines in pg_stat_statements In fact I like the idea of a table created per session, but I would even prefer a common "table" shared between all sessions like GLOBAL TEMPORARY TABLE (or something similar) as described here: https://www.postgresql.org/message-id/flat/157703426606.1198.2452090605041230054.pgcf%40coridan.postgresql.org#331e8344bbae904350af161fb43a0aa6 That would remove the drop/create issue, permits to reduce planning time for "WITH updt ..." statements (as done today in PLpgsql triggers), and would fix the pgss "bloat" issue. Like that the "cost" of the immediate refresh approach would be easier to support ;o) Regards PAscal -- Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
On Mon, 23 Dec 2019 03:41:18 -0700 (MST) legrand legrand <legrand_legrand@hotmail.com> wrote: > Hello, > regarding my initial post: > > > For each insert into a base table there are 3 statements: > > - ANALYZE pg_temp_3.pg_temp_81976 > > - WITH updt AS ( UPDATE public.mv1 AS mv SET __ivm_count__ = ... > > - DROP TABLE pg_temp_3.pg_temp_81976 > > For me there where 3 points to discuss: > - create/drop tables may bloat dictionnary tables > - create/drop tables prevents "WITH updt ..." from being shared (with some > plan caching) > - generates many lines in pg_stat_statements > > In fact I like the idea of a table created per session, but I would even > prefer a common "table" shared between all sessions like GLOBAL TEMPORARY > TABLE (or something similar) as described here: > https://www.postgresql.org/message-id/flat/157703426606.1198.2452090605041230054.pgcf%40coridan.postgresql.org#331e8344bbae904350af161fb43a0aa6 Although I have not looked into this thread, this may be help if this is implemented. However, it would be still necessary to truncate the table before the view maintenance because such tables always exist and can be accessed and modified by any users. -- Yugo Nagata <nagata@sraoss.co.jp>
From: Tatsuo Ishii <ishii@sraoss.co.jp> > First of all, we do not think that current approach is the final > one. Instead we want to implement IVM feature one by one: i.e. we > start with "immediate update" approach, because it's simple and easier > to implement. Then we will add "deferred update" mode later on. I agree about incremental feature introduction. What I'm simply asking is the concrete use case (workload and data), sothat I can convince myself to believe that this feature is useful and focus on reviewing and testing (because the patchseems big and difficult...) > In fact Oracle has both "immediate update" and "deferred update" mode > of IVM (actually there are more "mode" with their implementation). > > I recommend you to look into Oracle's materialized view feature > closely. For fair evaluation, probably we should compare the IVM patch > with Oracle's "immediate update" (they call it "on statement") mode. > > Probably deferred IVM mode is more suitable for DWH. However as I said > earlier, we hope to implement the immediate mode first then add the > deferred mode. Let's start with simple one then add more features. Yes, I know Oracle's ON STATEMENT refresh mode (I attached references at the end for others.) Unfortunately, it's not clear to me which of ON STATEMENT or ON COMMIT the user should choose. The benefit of ON STATEMENTis that the user does not have to create and maintain the materialized view log. But I'm not sure if and when thebenefit defeats the performance overhead on DML statements. It's not disclosed whether ON STATEMENT uses triggers. Could you give your opinion on the following to better understand the proposed feature and/or Oracle's ON STATEMENT refreshmode? * What use case does the feature fit? If the trigger makes it difficult to use in the data ware house, does the feature target OLTP? What kind of data and query would benefit most from the feature (e.g. join of a large sales table and a small product table,where the data volume and frequency of data loading is ...)? In other words, this is about what kind of example we can recommend as a typical use case of this feature. * Do you think the benefit of ON STATEMENT (i.e. do not have to use materialized view log) outweighs the drawback of ON STATEMENT (i.g. DML overhead)? * Do you think it's important to refresh the materialized view after every statement, or the per-statement refresh is nota requirement but simply the result of implementation? [References] https://docs.oracle.com/en/database/oracle/oracle-database/19/dwhsg/refreshing-materialized-views.html#GUID-C40C225A-8328-44D5-AE90-9078C2C773EA -------------------------------------------------- 7.1.5 About ON COMMIT Refresh for Materialized Views A materialized view can be refreshed automatically using the ON COMMIT method. Therefore, whenever a transaction commitswhich has updated the tables on which a materialized view is defined, those changes are automatically reflected inthe materialized view. The advantage of using this approach is you never have to remember to refresh the materialized view.The only disadvantage is the time required to complete the commit will be slightly longer because of the extra processinginvolved. However, in a data warehouse, this should not be an issue because there is unlikely to be concurrentprocesses trying to update the same table. 7.1.6 About ON STATEMENT Refresh for Materialized Views A materialized view that uses the ON STATEMENT refresh mode is automatically refreshed every time a DML operation is performedon any of the materialized view’s base tables. With the ON STATEMENT refresh mode, any changes to the base tables are immediately reflected in the materialized view. Thereis no need to commit the transaction or maintain materialized view logs on the base tables. If the DML statements aresubsequently rolled back, then the corresponding changes made to the materialized view are also rolled back. The advantage of the ON STATEMENT refresh mode is that the materialized view is always synchronized with the data in thebase tables, without the overhead of maintaining materialized view logs. However, this mode may increase the time takento perform a DML operation because the materialized view is being refreshed as part of the DML operation. -------------------------------------------------- https://docs.oracle.com/en/database/oracle/oracle-database/19/dwhsg/release-changes.html#GUID-2A2D6E3B-A3FD-47A8-82A3-1EF95AEF5993 -------------------------------------------------- ON STATEMENT refresh mode for materialized views The ON STATEMENT refresh mode refreshes materialized views every time a DML operation is performed on any base table, withoutthe need to commit the transaction. This mode does not require you to maintain materialized view logs on the basetables. -------------------------------------------------- http://www.oracle.com/us/solutions/sap/matview-refresh-db12c-2877319.pdf -------------------------------------------------- We have introduced a new Materialized View (MV) refresh mechanism called ON STATEMENT refresh. With the ON STATEMENT refreshmethod, an MV is automatically refreshed whenever DML happens on a base table of the MV. Therefore, whenever a DMLhappens on any table on which a materialized view is defined, the change is automatically reflected in the materializedview. The advantage of using this approach is that the user no long needs to create a materialized view log oneach of the base table in order to do fast refresh. The refresh can then avoid the overhead introduced by MV logging butstill keep the materialized view refreshed all the time. Specify ON STATEMENT to indicate that a fast refresh is to occur whenever DML happens on a base table of the materializedview. This is to say, ON STATEMENT materialized view is always in sync with base table changes even before thetransaction commits. If a transaction that made changes to the base tables rolls back, the corresponding changes in onstatement MV are rolled back as well. This clause may increase the time taken to complete a DML, because the database performsthe refresh operation as part of the DML execution. However, unlike other types of fast refreshable materializedviews, ON STATEMENT MV refresh no longer requires MV log on the base tables or any extra work on MV logs in orderto do fast refresh. -------------------------------------------------- Regards Takayuki Tsunakawa
From: Yugo Nagata <nagata@sraoss.co.jp> > On Mon, 23 Dec 2019 08:08:53 +0000 > "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote: > > How about unlogged tables ? I thought the point of using a temp table is to > avoid WAL overhead. > > Hmm... this might be another option. However, if we use unlogged tables, > we will need to create them in a special schema similar to pg_toast > to split this from user tables. Otherwise, we need to create and drop > unlogged tables repeatedly for each session. Maybe we can create the work tables in the same schema as the materialized view, following: * Prefix the table name to indicate that the table is system-managed, thus alluding to the user that manually deleting thetable would break something. This is like the system attribute __imv_count you are proposing. * Describe the above in the manual. Columns of serial and bigserial data type similarly create sequences behind the scenes. * Make the work tables depend on the materialized view by recording the dependency in pg_depend, so that Dropping the materializedview will also drop its work tables. Regards Takayuki Tsunakawa
> Unfortunately, it's not clear to me which of ON STATEMENT or ON COMMIT the user should choose. The benefit of ON STATEMENTis that the user does not have to create and maintain the materialized view log. But I'm not sure if and when thebenefit defeats the performance overhead on DML statements. It's not disclosed whether ON STATEMENT uses triggers. AFAIK benefit of ON STATEMENT is the transaction can see the result of update to the base tables. With ON COMMIT, the transaction does not see the result until the transaction commits. > Could you give your opinion on the following to better understand the proposed feature and/or Oracle's ON STATEMENT refreshmode? > > * What use case does the feature fit? > If the trigger makes it difficult to use in the data ware house, does the feature target OLTP? Well, I can see use cases of IVM in both DWH and OLTP. For example, a user create a DWH-like data using materialized view. After the initial data is loaded, the data is seldom updated. However one day a user wants to change just one row to see how it affects to the whole DWH data. IVM will help here because it could be done in shorter time than loading whole data. Another use case is a ticket selling system. The system shows how many tickets remain in a real time manner. For this purpose it needs to count the number of tickets already sold from a log table. By using IVM, it could be accomplished in simple and effective way. > What kind of data and query would benefit most from the feature (e.g. join of a large sales table and a small product table,where the data volume and frequency of data loading is ...)? > In other words, this is about what kind of example we can recommend as a typical use case of this feature. Here are some use cases suitable for IVM I can think of: - Users are creating home made triggers to get data from tables. Since IVM could eliminates some of those triggers, we could expect less maintenance cost and bugs accidentally brought in when the triggers were created. - Any use case in which the cost of refreshing whole result table (materialized view) is so expensive that it justifies the cost of updating of base tables. See the example of use cases above. > * Do you think the benefit of ON STATEMENT (i.e. do not have to use materialized view log) outweighs the drawback of ON STATEMENT (i.g. DML overhead)? Outweights to what? > * Do you think it's important to refresh the materialized view after every statement, or the per-statement refresh is nota requirement but simply the result of implementation? I think it's important to refresh the materialized view after every statement and the benefit for users are apparent because it brings real time data refresh to users. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Yugo Nagata wrote > On Mon, 23 Dec 2019 03:41:18 -0700 (MST) > legrand legrand < > legrand_legrand@ > > wrote: > > [ ...] > >> I would even >> prefer a common "table" shared between all sessions like GLOBAL TEMPORARY >> TABLE (or something similar) as described here: >> https://www.postgresql.org/message-id/flat/157703426606.1198.2452090605041230054.pgcf%40coridan.postgresql.org#331e8344bbae904350af161fb43a0aa6 > > Although I have not looked into this thread, this may be help if this is > implemented. However, it would be still necessary to truncate the table > before the view maintenance because such tables always exist and can be > accessed and modified by any users. > > -- > Yugo Nagata < > nagata@.co > > For information, in this table data is PRIVATE to each session, can be purged on the ON COMMIT event and disappear at SESSION end. Yes, this feature could be utile only if it's implemented. And you are rigth some data has to be deleted on the ON STATEMENT event (not sure if TRUNCATE is Global or Session specific in this situation). -- Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
> Materialized view reminds me of the use in a data warehouse. Oracle handles the top in its Database Data Warehousing Guide,and Microsoft has just started to offer the materialized view feature in its Azure Synapse Analytics (formerly SQLData Warehouse). AWS also has previewed Redshift's materialized view feature in re:Invent 2019. Are you targeting thedata warehouse (analytics) workload? > > IIUC, to put (over) simply, the data warehouse has two kind of tables: > > * Facts (transaction data): e.g. sales, user activity > Large amount. INSERT only on a regular basis (ETL/ELT) or continuously (streaming) > > * Dimensions (master/reference data): e.g. product, customer, time, country > Small amount. Infrequently INSERTed or UPDATEd. > > > The proposed trigger-based approach does not seem to be suitable for the facts, because the trigger overhead imposed ondata loading may offset or exceed the time saved by incrementally refreshing the materialized views. I think that depends on use case of the DWH. If the freshness of materialized view tables is important for a user, then the cost of the trigger overhead may be acceptable for the user. > Then, does the proposed feature fit the dimension tables? If the materialized view is only based on the dimension data,then the full REFRESH of the materialized view wouldn't take so long. The typical materialized view should join thefact and dimension tables. Then, the fact table will have to have the triggers, causing the data loading slowdown. > > I'm saying this because I'm concerned about the trigger based overhead. As you know, Oracle uses materialized view logsto save changes and incrementally apply them later to the materialized views (REFRESH ON STATEMENT materialized viewsdoesn't require the materialized view log, so it might use triggers.) Does any commercial grade database implementmaterialized view using triggers? I couldn't find relevant information regarding Azure Synapse and Redshift. I heard that REFRESH ON STATEMENT of Oracle has been added after ON COMMIT materialized view. So I suspect Oracle realizes that there are needs/use case for ON STATEMENT, but I am not sure. > If our only handy option is a trigger, can we minimize the overhead by doing the view maintenance at transaction commit? I am not sure it's worth the trouble. If it involves some form of logging, then I think it should be used for deferred IVM first because it has more use case than on commit IVM. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
From: Tatsuo Ishii <ishii@sraoss.co.jp> > AFAIK benefit of ON STATEMENT is the transaction can see the result of > update to the base tables. With ON COMMIT, the transaction does not > see the result until the transaction commits. > > Well, I can see use cases of IVM in both DWH and OLTP. > > For example, a user create a DWH-like data using materialized > view. After the initial data is loaded, the data is seldom updated. > However one day a user wants to change just one row to see how it > affects to the whole DWH data. IVM will help here because it could be > done in shorter time than loading whole data. > I heard that REFRESH ON STATEMENT of Oracle has been added after ON > COMMIT materialized view. So I suspect Oracle realizes that there are > needs/use case for ON STATEMENT, but I am not sure. Yes, it was added relatively recently in Oracle Database 12.2. As the following introduction to new features shows, thebenefits are described as twofold: 1) The transaction can see the refreshed view result without committing. 2) The materialized view log is not needed. I guess from these that the ON STATEMENT refresh mode can be useful when the user wants to experiment with some changes tosee how data change could affect the analytics result, without persisting the change. I think that type of experimentis done in completely or almost static data marts where the user is allowed to modify the data freely. The ONSTATEMENT refresh mode wouldn't be for the DWH that requires high-performance, regular and/or continuous data loading andmaintenance based on a rigorous discipline. But I'm still not sure if this is a real-world use case... https://docs.oracle.com/en/database/oracle/oracle-database/19/dwhsg/release-changes.html#GUID-2A2D6E3B-A3FD-47A8-82A3-1EF95AEF5993 -------------------------------------------------- ON STATEMENT refresh mode for materialized views The ON STATEMENT refresh mode refreshes materialized views every time a DML operation is performed on any base table, withoutthe need to commit the transaction. This mode does not require you to maintain materialized view logs on the basetables. -------------------------------------------------- > Another use case is a ticket selling system. The system shows how many > tickets remain in a real time manner. For this purpose it needs to > count the number of tickets already sold from a log table. By using > IVM, it could be accomplished in simple and effective way. Wouldn't the app just have a table like ticket(id, name, quantity), decrement the quantity when the ticket is sold, and readthe current quantity to know the remaining tickets? If many consumers try to buy tickets for a popular event, the materializedview refresh would limit the concurrency. > Here are some use cases suitable for IVM I can think of: > > - Users are creating home made triggers to get data from tables. Since > IVM could eliminates some of those triggers, we could expect less > maintenance cost and bugs accidentally brought in when the triggers > were created. > > - Any use case in which the cost of refreshing whole result table > (materialized view) is so expensive that it justifies the cost of > updating of base tables. See the example of use cases above. I think we need to find a typical example of this. That should be useful to write the manual article, because it's betterto caution users that the IMV is a good fit for this case and not for that case. Using real-world table names in thesyntax example will also be good. > > * Do you think the benefit of ON STATEMENT (i.e. do not have to use > materialized view log) outweighs the drawback of ON STATEMENT (i.g. DML > overhead)? > > Outweights to what? "outweigh" means "exceed." I meant that I'm wondering if and why users prefer ON STATEMENT's benefit despite of its additionaloverhead on update statements. Bottom line: The use of triggers makes me hesitate, because I saw someone's (probably Fujii san) article that INSERTs intoinheritance-and-trigger-based partitioned tables were 10 times slower than the declaration-based partitioned tables. I think I will try to find a good use case. Regards Takayuki Tsunakawa
>> Another use case is a ticket selling system. The system shows how many >> tickets remain in a real time manner. For this purpose it needs to >> count the number of tickets already sold from a log table. By using >> IVM, it could be accomplished in simple and effective way. > > Wouldn't the app just have a table like ticket(id, name, quantity), decrement the quantity when the ticket is sold, andread the current quantity to know the remaining tickets? If many consumers try to buy tickets for a popular event, thematerialized view refresh would limit the concurrency. Yes, as long as number of sold ticks is the only important data for the system, it could be true. However suppose the system wants to start sort of "campaign" and the system needs to collect statistics of counts depending on the city that each ticket buyer belongs to so that certain offer is limited to first 100 ticket buyers in each city. In this case IVM will give more flexible way to handle this kind of requirements than having adhoc city counts column in a table. > I think we need to find a typical example of this. That should be useful to write the manual article, because it's betterto caution users that the IMV is a good fit for this case and not for that case. Using real-world table names in thesyntax example will also be good. In general I agree. I'd try to collect good real-world examples by myself but my experience is limited. I hope people in this community come up with such that examples. > "outweigh" means "exceed." I meant that I'm wondering if and why users prefer ON STATEMENT's benefit despite of its additionaloverhead on update statements. I already found at least one such user in the upthread if I don't missing something. > Bottom line: The use of triggers makes me hesitate, because I saw someone's (probably Fujii san) article that INSERTs intoinheritance-and-trigger-based partitioned tables were 10 times slower than the declaration-based partitioned tables. I think I will try to find a good use case. Great. In the mean time we will try to mitigate the overhead of IVM (triggers are just one of them). Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Hi, Attached is the latest patch (v11) to add support for Incremental Materialized View Maintenance (IVM). Differences from the previous patch (v10) include: - Prohibit creating matviews including mutable functions Matviews including mutable functions (for example now(),random(), ... etc) could result in inconsistent data with the basetables. This patch adds a check whether the requested matview definition includes SELECTs using mutable functions. If so, raise anerror while creating the matview. This issue is reported by nuko-san. https://www.postgresql.org/message-id/CAF3Gu1Z950HqQJzwanbeg7PmUXLc+7uZMstfnLeZM9iqDWeW9Q@mail.gmail.com Currently other IVM's support status is: > IVM is a way to make materialized views up-to-date in which only > incremental changes are computed and applied on views rather than > recomputing the contents from scratch as REFRESH MATERIALIZED VIEW > does. IVM can update materialized views more efficiently > than recomputation when only small part of the view need updates. > > There are two approaches with regard to timing of view maintenance: > immediate and deferred. In immediate maintenance, views are updated in > the same transaction where its base table is modified. In deferred > maintenance, views are updated after the transaction is committed, > for example, when the view is accessed, as a response to user command > like REFRESH, or periodically in background, and so on. > > This patch implements a kind of immediate maintenance, in which > materialized views are updated immediately in AFTER triggers when a > base table is modified. > > This supports views using: > - inner and outer joins including self-join > - some built-in aggregate functions (count, sum, agv, min, max) > - a part of subqueries > -- simple subqueries in FROM clause > -- EXISTS subqueries in WHERE clause > - DISTINCT and views with tuple duplicates > > === > Here are major changes we made after the previous submitted patch: > > * Aggregate functions are checked if they can be used in IVM > using their OID. Per comments from Alvaro Herrera. > > For this purpose, Gen_fmgrtab.pl was modified so that OIDs of > aggregate functions are output to fmgroids.h. > > * Some bug fixes including: > > - Mistake of tab-completion of psql pointed out by nuko-san > - A bug relating rename of matview pointed out by nuko-san > - spelling errors > - etc. > > * Add documentations for IVM > > * Patch is splited into eleven parts to make review easier > as suggested by Amit Langote: > > - 0001: Add a new syntax: > CREATE INCREMENTAL MATERIALIZED VIEW > - 0002: Add a new column relisivm to pg_class > - 0003: Change trigger.c to allow to prolong life span of tupestores > containing Transition Tables generated via AFTER trigger > - 0004: Add the basic IVM future using counting algorithm: > This supports inner joins, DISTINCT, and tuple duplicates. > - 0005: Change GEN_fmgrtab.pl to output aggregate function's OIDs > - 0006: Add aggregates support for IVM > - 0007: Add subqueries support for IVM > - 0008: Add outer joins support for IVM > - 0009: Add IVM support to psql command > - 0010: Add regression tests for IVM > - 0011: Add documentations for IVM > > === > Todo: > > Currently, REFRESH and pg_dump/pg_restore is not supported, but > we are working on them. > > Also, TRUNCATE is not supported. When TRUNCATE command is executed > on a base table, nothing occurs on materialized views. We are > now considering another better options, like: > > - Raise an error or warning when a base table is TRUNCATEed. > - Make the view non-scannable (like WITH NO DATA) > - Update the view in some ways. It would be easy for inner joins > or aggregate views, but there is some difficult with outer joins. Best Regards, -- Takuma Hoshiai <hoshiai@sraoss.co.jp>
Attachment
On Tue, 24 Dec 2019 07:07:35 +0000 "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote: > From: Yugo Nagata <nagata@sraoss.co.jp> > > On Mon, 23 Dec 2019 08:08:53 +0000 > > "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote: > > > How about unlogged tables ? I thought the point of using a temp table is to > > avoid WAL overhead. > > > > Hmm... this might be another option. However, if we use unlogged tables, > > we will need to create them in a special schema similar to pg_toast > > to split this from user tables. Otherwise, we need to create and drop > > unlogged tables repeatedly for each session. > > Maybe we can create the work tables in the same schema as the materialized view, following: > > * Prefix the table name to indicate that the table is system-managed, thus alluding to the user that manually deletingthe table would break something. This is like the system attribute __imv_count you are proposing. > > * Describe the above in the manual. Columns of serial and bigserial data type similarly create sequences behind the scenes. > > * Make the work tables depend on the materialized view by recording the dependency in pg_depend, so that Dropping the materializedview will also drop its work tables. Maybe it works, but instead of using special names for work tables, we can also create a schema whose name is special and place work tables in this. This will not annoy users with information they are not interested in when, for example, psql meta-commands like \d are used. Anyway, I understood it is better to avoid creating and dropping temporary tables during view maintenance per statement. -- Yugo Nagata <nagata@sraoss.co.jp>
Hello, Thank you for this patch. I have tried to use an other patch with yours: "Planning counters in pg_stat_statements (using pgss_store)" https://www.postgresql.org/message-id/CAOBaU_Y12bn0tOdN9RMBZn29bfYYH11b2CwKO1RO7dX9fQ3aZA%40mail.gmail.com setting shared_preload_libraries='pg_stat_statements' pg_stat_statements.track=all and creating the extension When trying following syntax: create table b1 (id integer, x numeric(10,3)); create incremental materialized view mv1 as select id, count(*),sum(x) from b1 group by id; insert into b1 values (1,1) I got an ASSERT FAILURE in pg_stat_statements.c on Assert(query != NULL); comming from matview.c refresh_matview_datafill(dest_old, query, queryEnv, NULL); or refresh_matview_datafill(dest_new, query, queryEnv, NULL); If this (last) NULL field was replaced by the query text, a comment or just "n/a", it would fix the problem. Could this be investigated ? Thanks in advance Regards PAscal -- Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
On Sat, Dec 28, 2019 at 12:42 AM legrand legrand <legrand_legrand@hotmail.com> wrote: > > Hello, > Thank you for this patch. > > I have tried to use an other patch with yours: > "Planning counters in pg_stat_statements (using pgss_store)" > https://www.postgresql.org/message-id/CAOBaU_Y12bn0tOdN9RMBZn29bfYYH11b2CwKO1RO7dX9fQ3aZA%40mail.gmail.com > > setting > shared_preload_libraries='pg_stat_statements' > pg_stat_statements.track=all > and creating the extension > > > When trying following syntax: > > create table b1 (id integer, x numeric(10,3)); > create incremental materialized view mv1 as select id, count(*),sum(x) from > b1 group by id; > insert into b1 values (1,1) > > I got an ASSERT FAILURE in pg_stat_statements.c > on > Assert(query != NULL); > > comming from matview.c > refresh_matview_datafill(dest_old, query, queryEnv, NULL); > or > refresh_matview_datafill(dest_new, query, queryEnv, NULL); > > > If this (last) NULL field was replaced by the query text, a comment or just > "n/a", > it would fix the problem. > > Could this be investigated ? I digged deeper into this. I found a bug in the pg_stat_statements patch, as the new pgss_planner_hook() doesn't check for a non-zero queryId, which I think should avoid that problem. This however indeed raises the question on whether the query text should be provided, and if the behavior is otherwise correct. If I understand correctly, for now this specific query won't go through parse_analysis, thus won't get a queryId and will be ignored in pgss_ExecutorEnd, so it'll be entirely invisible, except with auto_explain which will only show an orphan plan like this: 2019-12-28 12:03:29.334 CET [9399] LOG: duration: 0.180 ms plan: HashAggregate (cost=0.04..0.06 rows=1 width=60) Group Key: new_16385_0.id -> Named Tuplestore Scan (cost=0.00..0.02 rows=1 width=52)
```
[ec2-user@ip-10-0-1-10 ivm]$ psql --version
psql (PostgreSQL) 13devel-ivm-3bf6953688153fa72dd48478a77e37cf3111a1ee
[ec2-user@ip-10-0-1-10 ivm]$ psql testdb -e -f limit-problem.sql
DROP TABLE IF EXISTS test CASCADE;
psql:limit-problem.sql:1: NOTICE: drop cascades to materialized view test_imv
DROP TABLE
CREATE TABLE test (id int primary key, data text);
CREATE TABLE
INSERT INTO test VALUES (generate_series(1, 10), 'foo');
INSERT 0 10
CREATE INCREMENTAL MATERIALIZED VIEW test_imv AS SELECT * FROM test LIMIT 1;
SELECT 1
Materialized view "public.test_imv"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
---------------+---------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
data | text | | | | extended | |
__ivm_count__ | bigint | | | | plain | |
View definition:
SELECT test.id,
test.data
FROM test
LIMIT 1;
Access method: heap
Incremental view maintenance: yes
SELECT * FROM test LIMIT 1;
id | data
----+------
1 | foo
(1 row)
TABLE test_imv;
id | data
----+------
1 | foo
(1 row)
UPDATE test SET data = 'bar' WHERE id = 1;
UPDATE 1
SELECT * FROM test LIMIT 1;
id | data
----+------
2 | foo
(1 row)
TABLE test_imv;
id | data
----+------
1 | bar
(1 row)
DELETE FROM test WHERE id = 1;
DELETE 1
SELECT * FROM test LIMIT 1;
id | data
----+------
2 | foo
(1 row)
TABLE test_imv;
id | data
----+------
(0 rows)
```
ORDER BY clause is not allowed when executing CREATE INCREMENTAL MATELIARIZED VIEW.
Hi,
I would like to implement Incremental View Maintenance (IVM) on PostgreSQL.
IVM is a technique to maintain materialized views which computes and applies
only the incremental changes to the materialized views rather than
recomputate the contents as the current REFRESH command does.
I had a presentation on our PoC implementation of IVM at PGConf.eu 2018 [1].
Our implementation uses row OIDs to compute deltas for materialized views.
The basic idea is that if we have information about which rows in base tables
are contributing to generate a certain row in a matview then we can identify
the affected rows when a base table is updated. This is based on an idea of
Dr. Masunaga [2] who is a member of our group and inspired from ID-based
approach[3].
In our implementation, the mapping of the row OIDs of the materialized view
and the base tables are stored in "OID map". When a base relation is modified,
AFTER trigger is executed and the delta is recorded in delta tables using
the transition table feature. The accual udpate of the matview is triggerd
by REFRESH command with INCREMENTALLY option.
However, we realize problems of our implementation. First, WITH OIDS will
be removed since PG12, so OIDs are no longer available. Besides this, it would
be hard to implement this since it needs many changes of executor nodes to
collect base tables's OIDs during execuing a query. Also, the cost of maintaining
OID map would be high.
For these reasons, we started to think to implement IVM without relying on OIDs
and made a bit more surveys.
We also looked at Kevin Grittner's discussion [4] on incremental matview
maintenance. In this discussion, Kevin proposed to use counting algorithm [5]
to handle projection views (using DISTNICT) properly. This algorithm need an
additional system column, count_t, in materialized views and delta tables of
base tables.
However, the discussion about IVM is now stoped, so we would like to restart and
progress this.
Through our PoC inplementation and surveys, I think we need to think at least
the followings for implementing IVM.
1. How to extract changes on base tables
I think there would be at least two approaches for it.
- Using transition table in AFTER triggers
- Extracting changes from WAL using logical decoding
In our PoC implementation, we used AFTER trigger and transition tables, but using
logical decoding might be better from the point of performance of base table
modification.
If we can represent a change of UPDATE on a base table as query-like rather than
OLD and NEW, it may be possible to update the materialized view directly instead
of performing delete & insert.
2. How to compute the delta to be applied to materialized views
Essentially, IVM is based on relational algebra. Theorically, changes on base
tables are represented as deltas on this, like "R <- R + dR", and the delta on
the materialized view is computed using base table deltas based on "change
propagation equations". For implementation, we have to derive the equation from
the view definition query (Query tree, or Plan tree?) and describe this as SQL
query to compulte delta to be applied to the materialized view.
There could be several operations for view definition: selection, projection,
join, aggregation, union, difference, intersection, etc. If we can prepare a
module for each operation, it makes IVM extensable, so we can start a simple
view definition, and then support more complex views.
3. How to identify rows to be modifed in materialized views
When applying the delta to the materialized view, we have to identify which row
in the matview is corresponding to a row in the delta. A naive method is matching
by using all columns in a tuple, but clearly this is unefficient. If thematerialized
view has unique index, we can use this. Maybe, we have to force materialized views
to have all primary key colums in their base tables. In our PoC implementation, we
used OID to identify rows, but this will be no longer available as said above.
4. When to maintain materialized views
There are two candidates of the timing of maintenance, immediate (eager) or deferred.
In eager maintenance, the materialized view is updated in the same transaction
where the base table is updated. In deferred maintenance, this is done after the
transaction is commited, for example, when view is accessed, as a response to user
request, etc.
In the previous discussion[4], it is planned to start from "eager" approach. In our PoC
implementaion, we used the other aproach, that is, using REFRESH command to perform IVM.
I am not sure which is better as a start point, but I begin to think that the eager
approach may be more simple since we don't have to maintain base table changes in other
past transactions.
In the eager maintenance approache, we have to consider a race condition where two
different transactions change base tables simultaneously as discussed in [4].
[1] https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/
[2] https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1 (Japanese only)
[3] https://dl.acm.org/citation.cfm?id=2750546
[4] https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com
[5] https://dl.acm.org/citation.cfm?id=170066
Regards,
--
Yugo Nagata <nagata@sraoss.co.jp>
On Sat, 11 Jan 2020 09:27:58 +0900 nuko yokohama <nuko.yokohama@gmail.com> wrote: > LIMIT clause without ORDER BY should be prohibited when creating > incremental materialized views. > > In SQL, the result of a LIMIT clause without ORDER BY is undefined. > If the LIMIT clause is allowed when creating an incremental materialized > view, incorrect results will be obtained when the view is updated after > updating the source table. Thank you for your advice. It's just as you said. LIMIT/OFFSET clause should is prohibited. We will add this to next patch. Best Regards, Takuma Hoshiai > > ``` > [ec2-user@ip-10-0-1-10 ivm]$ psql --version > psql (PostgreSQL) 13devel-ivm-3bf6953688153fa72dd48478a77e37cf3111a1ee > [ec2-user@ip-10-0-1-10 ivm]$ psql testdb -e -f limit-problem.sql > DROP TABLE IF EXISTS test CASCADE; > psql:limit-problem.sql:1: NOTICE: drop cascades to materialized view > test_imv > DROP TABLE > CREATE TABLE test (id int primary key, data text); > CREATE TABLE > INSERT INTO test VALUES (generate_series(1, 10), 'foo'); > INSERT 0 10 > CREATE INCREMENTAL MATERIALIZED VIEW test_imv AS SELECT * FROM test LIMIT 1; > SELECT 1 > Materialized view "public.test_imv" > Column | Type | Collation | Nullable | Default | Storage | > Stats target | Description > ---------------+---------+-----------+----------+---------+----------+--------------+------------- > id | integer | | | | plain | > | > data | text | | | | extended | > | > __ivm_count__ | bigint | | | | plain | > | > View definition: > SELECT test.id, > test.data > FROM test > LIMIT 1; > Access method: heap > Incremental view maintenance: yes > > SELECT * FROM test LIMIT 1; > id | data > ----+------ > 1 | foo > (1 row) > > TABLE test_imv; > id | data > ----+------ > 1 | foo > (1 row) > > UPDATE test SET data = 'bar' WHERE id = 1; > UPDATE 1 > SELECT * FROM test LIMIT 1; > id | data > ----+------ > 2 | foo > (1 row) > > TABLE test_imv; > id | data > ----+------ > 1 | bar > (1 row) > > DELETE FROM test WHERE id = 1; > DELETE 1 > SELECT * FROM test LIMIT 1; > id | data > ----+------ > 2 | foo > (1 row) > > TABLE test_imv; > id | data > ----+------ > (0 rows) > ``` > > ORDER BY clause is not allowed when executing CREATE INCREMENTAL > MATELIARIZED VIEW. > We propose not to allow LIMIT clauses as well. > > > 2018年12月27日(木) 21:57 Yugo Nagata <nagata@sraoss.co.jp>: > > > Hi, > > > > I would like to implement Incremental View Maintenance (IVM) on > > PostgreSQL. > > IVM is a technique to maintain materialized views which computes and > > applies > > only the incremental changes to the materialized views rather than > > recomputate the contents as the current REFRESH command does. > > > > I had a presentation on our PoC implementation of IVM at PGConf.eu 2018 > > [1]. > > Our implementation uses row OIDs to compute deltas for materialized > > views. > > The basic idea is that if we have information about which rows in base > > tables > > are contributing to generate a certain row in a matview then we can > > identify > > the affected rows when a base table is updated. This is based on an idea of > > Dr. Masunaga [2] who is a member of our group and inspired from ID-based > > approach[3]. > > > > In our implementation, the mapping of the row OIDs of the materialized view > > and the base tables are stored in "OID map". When a base relation is > > modified, > > AFTER trigger is executed and the delta is recorded in delta tables using > > the transition table feature. The accual udpate of the matview is triggerd > > by REFRESH command with INCREMENTALLY option. > > > > However, we realize problems of our implementation. First, WITH OIDS will > > be removed since PG12, so OIDs are no longer available. Besides this, it > > would > > be hard to implement this since it needs many changes of executor nodes to > > collect base tables's OIDs during execuing a query. Also, the cost of > > maintaining > > OID map would be high. > > > > For these reasons, we started to think to implement IVM without relying on > > OIDs > > and made a bit more surveys. > > > > We also looked at Kevin Grittner's discussion [4] on incremental matview > > maintenance. In this discussion, Kevin proposed to use counting algorithm > > [5] > > to handle projection views (using DISTNICT) properly. This algorithm need > > an > > additional system column, count_t, in materialized views and delta tables > > of > > base tables. > > > > However, the discussion about IVM is now stoped, so we would like to > > restart and > > progress this. > > > > > > Through our PoC inplementation and surveys, I think we need to think at > > least > > the followings for implementing IVM. > > > > 1. How to extract changes on base tables > > > > I think there would be at least two approaches for it. > > > > - Using transition table in AFTER triggers > > - Extracting changes from WAL using logical decoding > > > > In our PoC implementation, we used AFTER trigger and transition tables, > > but using > > logical decoding might be better from the point of performance of base > > table > > modification. > > > > If we can represent a change of UPDATE on a base table as query-like > > rather than > > OLD and NEW, it may be possible to update the materialized view directly > > instead > > of performing delete & insert. > > > > > > 2. How to compute the delta to be applied to materialized views > > > > Essentially, IVM is based on relational algebra. Theorically, changes on > > base > > tables are represented as deltas on this, like "R <- R + dR", and the > > delta on > > the materialized view is computed using base table deltas based on "change > > propagation equations". For implementation, we have to derive the > > equation from > > the view definition query (Query tree, or Plan tree?) and describe this as > > SQL > > query to compulte delta to be applied to the materialized view. > > > > There could be several operations for view definition: selection, > > projection, > > join, aggregation, union, difference, intersection, etc. If we can > > prepare a > > module for each operation, it makes IVM extensable, so we can start a > > simple > > view definition, and then support more complex views. > > > > > > 3. How to identify rows to be modifed in materialized views > > > > When applying the delta to the materialized view, we have to identify > > which row > > in the matview is corresponding to a row in the delta. A naive method is > > matching > > by using all columns in a tuple, but clearly this is unefficient. If > > thematerialized > > view has unique index, we can use this. Maybe, we have to force > > materialized views > > to have all primary key colums in their base tables. In our PoC > > implementation, we > > used OID to identify rows, but this will be no longer available as said > > above. > > > > > > 4. When to maintain materialized views > > > > There are two candidates of the timing of maintenance, immediate (eager) > > or deferred. > > > > In eager maintenance, the materialized view is updated in the same > > transaction > > where the base table is updated. In deferred maintenance, this is done > > after the > > transaction is commited, for example, when view is accessed, as a response > > to user > > request, etc. > > > > In the previous discussion[4], it is planned to start from "eager" > > approach. In our PoC > > implementaion, we used the other aproach, that is, using REFRESH command > > to perform IVM. > > I am not sure which is better as a start point, but I begin to think that > > the eager > > approach may be more simple since we don't have to maintain base table > > changes in other > > past transactions. > > > > In the eager maintenance approache, we have to consider a race condition > > where two > > different transactions change base tables simultaneously as discussed in > > [4]. > > > > > > [1] > > https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/ > > [2] > > https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1 > > (Japanese only) > > [3] https://dl.acm.org/citation.cfm?id=2750546 > > [4] > > https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com > > [5] https://dl.acm.org/citation.cfm?id=170066 > > > > Regards, > > -- > > Yugo Nagata <nagata@sraoss.co.jp> > > > > -- Takuma Hoshiai <hoshiai@sraoss.co.jp>
(commit e150d964df7e3aeb768e4bae35d15764f8abd284)
A SELECT statement using the MIN() and MAX() functions can be executed on a user-defined type column that implements the aggregate functions MIN () and MAX ().
However, if the same SELECT statement is specified in the AS clause of CREATE INCREMENTAL MATERIALIZED VIEW, the following error will occur.
```
SELECT MIN(data) data_min, MAX(data) data_max FROM foo;
data_min | data_max
----------+----------
1/3 | 2/3
(1 row)
CREATE INCREMENTAL MATERIALIZED VIEW foo_min_imv AS SELECT MIN(data) data_min FROM foo;
psql:extension-agg.sql:14: ERROR: aggregate function min is not supported
CREATE INCREMENTAL MATERIALIZED VIEW foo_max_imv AS SELECT MAX(data) data_max FROM foo;
psql:extension-agg.sql:15: ERROR: aggregate function max is not supported
```
Does query including user-defined type aggregate operation not supported by INCREMENTAL MATERIALIZED VIEW?
An execution example is shown below.
```
[ec2-user@ip-10-0-1-10 ivm]$ cat extension-agg.sql
--
-- pg_fraction: https://github.com/nuko-yokohama/pg_fraction
--
DROP EXTENSION IF EXISTS pg_fraction CASCADE;
DROP TABLE IF EXISTS foo CASCADE;
CREATE EXTENSION IF NOT EXISTS pg_fraction;
\dx
\dT+ fraction
CREATE TABLE foo (id int, data fraction);
INSERT INTO foo (id, data) VALUES (1,'2/3'),(2,'1/3'),(3,'1/2');
SELECT MIN(data) data_min, MAX(data) data_max FROM foo;
CREATE INCREMENTAL MATERIALIZED VIEW foo_min_imv AS SELECT MIN(data) data_min FROM foo;
CREATE INCREMENTAL MATERIALIZED VIEW foo_max_imv AS SELECT MAX(data) data_max FROM foo;
SELECT MIN(id) id_min, MAX(id) id_max FROM foo;
CREATE INCREMENTAL MATERIALIZED VIEW foo_id_imv AS SELECT MIN(id) id_min, MAX(id) id_max FROM foo;
```
Best regards.
Hi,
I would like to implement Incremental View Maintenance (IVM) on PostgreSQL.
IVM is a technique to maintain materialized views which computes and applies
only the incremental changes to the materialized views rather than
recomputate the contents as the current REFRESH command does.
I had a presentation on our PoC implementation of IVM at PGConf.eu 2018 [1].
Our implementation uses row OIDs to compute deltas for materialized views.
The basic idea is that if we have information about which rows in base tables
are contributing to generate a certain row in a matview then we can identify
the affected rows when a base table is updated. This is based on an idea of
Dr. Masunaga [2] who is a member of our group and inspired from ID-based
approach[3].
In our implementation, the mapping of the row OIDs of the materialized view
and the base tables are stored in "OID map". When a base relation is modified,
AFTER trigger is executed and the delta is recorded in delta tables using
the transition table feature. The accual udpate of the matview is triggerd
by REFRESH command with INCREMENTALLY option.
However, we realize problems of our implementation. First, WITH OIDS will
be removed since PG12, so OIDs are no longer available. Besides this, it would
be hard to implement this since it needs many changes of executor nodes to
collect base tables's OIDs during execuing a query. Also, the cost of maintaining
OID map would be high.
For these reasons, we started to think to implement IVM without relying on OIDs
and made a bit more surveys.
We also looked at Kevin Grittner's discussion [4] on incremental matview
maintenance. In this discussion, Kevin proposed to use counting algorithm [5]
to handle projection views (using DISTNICT) properly. This algorithm need an
additional system column, count_t, in materialized views and delta tables of
base tables.
However, the discussion about IVM is now stoped, so we would like to restart and
progress this.
Through our PoC inplementation and surveys, I think we need to think at least
the followings for implementing IVM.
1. How to extract changes on base tables
I think there would be at least two approaches for it.
- Using transition table in AFTER triggers
- Extracting changes from WAL using logical decoding
In our PoC implementation, we used AFTER trigger and transition tables, but using
logical decoding might be better from the point of performance of base table
modification.
If we can represent a change of UPDATE on a base table as query-like rather than
OLD and NEW, it may be possible to update the materialized view directly instead
of performing delete & insert.
2. How to compute the delta to be applied to materialized views
Essentially, IVM is based on relational algebra. Theorically, changes on base
tables are represented as deltas on this, like "R <- R + dR", and the delta on
the materialized view is computed using base table deltas based on "change
propagation equations". For implementation, we have to derive the equation from
the view definition query (Query tree, or Plan tree?) and describe this as SQL
query to compulte delta to be applied to the materialized view.
There could be several operations for view definition: selection, projection,
join, aggregation, union, difference, intersection, etc. If we can prepare a
module for each operation, it makes IVM extensable, so we can start a simple
view definition, and then support more complex views.
3. How to identify rows to be modifed in materialized views
When applying the delta to the materialized view, we have to identify which row
in the matview is corresponding to a row in the delta. A naive method is matching
by using all columns in a tuple, but clearly this is unefficient. If thematerialized
view has unique index, we can use this. Maybe, we have to force materialized views
to have all primary key colums in their base tables. In our PoC implementation, we
used OID to identify rows, but this will be no longer available as said above.
4. When to maintain materialized views
There are two candidates of the timing of maintenance, immediate (eager) or deferred.
In eager maintenance, the materialized view is updated in the same transaction
where the base table is updated. In deferred maintenance, this is done after the
transaction is commited, for example, when view is accessed, as a response to user
request, etc.
In the previous discussion[4], it is planned to start from "eager" approach. In our PoC
implementaion, we used the other aproach, that is, using REFRESH command to perform IVM.
I am not sure which is better as a start point, but I begin to think that the eager
approach may be more simple since we don't have to maintain base table changes in other
past transactions.
In the eager maintenance approache, we have to consider a race condition where two
different transactions change base tables simultaneously as discussed in [4].
[1] https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/
[2] https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1 (Japanese only)
[3] https://dl.acm.org/citation.cfm?id=2750546
[4] https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com
[5] https://dl.acm.org/citation.cfm?id=170066
Regards,
--
Yugo Nagata <nagata@sraoss.co.jp>
Create an INCREMENTAL MATERIALIZED VIEW by specifying a query that includes user-defined type columns.
After the view is created, an error occurs when inserting into the view source table (including the user-defined type column).
```
ERROR: operator does not exist
```
An execution example is shown below.
```
[ec2-user@ip-10-0-1-10 ivm]$ psql testdb -a -f extension-insert.sql
--
-- pg_fraction: https://github.com/nuko-yokohama/pg_fraction
--
DROP EXTENSION IF EXISTS pg_fraction CASCADE;
psql:extension-insert.sql:4: NOTICE: drop cascades to column data of table foo
DROP EXTENSION
DROP TABLE IF EXISTS foo CASCADE;
DROP TABLE
CREATE EXTENSION IF NOT EXISTS pg_fraction;
CREATE EXTENSION
\dx
List of installed extensions
Name | Version | Schema | Description
-------------+---------+------------+------------------------------
pg_fraction | 1.0 | public | fraction data type
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
(2 rows)
\dT+ fraction
List of data types
Schema | Name | Internal name | Size | Elements | Owner | Access privileges | Description
--------+----------+---------------+------+----------+----------+-------------------+-------------
public | fraction | fraction | 16 | | postgres | |
(1 row)
CREATE TABLE foo (id int, data fraction);
CREATE TABLE
INSERT INTO foo (id, data) VALUES (1,'2/3'),(2,'1/3'),(3,'1/2');
INSERT 0 3
SELECT id, data FROM foo WHERE data >= '1/2';
id | data
----+------
1 | 2/3
3 | 1/2
(2 rows)
CREATE INCREMENTAL MATERIALIZED VIEW foo_imv AS SELECT id, data FROM foo WHERE data >= '1/2';
SELECT 2
TABLE foo_imv;
id | data
----+------
1 | 2/3
3 | 1/2
(2 rows)
INSERT INTO foo (id, data) VALUES (4,'2/3'),(5,'2/5'),(6,'3/6'); -- error
psql:extension-insert.sql:17: ERROR: operator does not exist: fraction pg_catalog.= fraction
LINE 1: ...(mv.id IS NULL AND diff.id IS NULL)) AND (mv.data OPERATOR(p...
^
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.
QUERY: WITH updt AS (UPDATE public.foo_imv AS mv SET __ivm_count__ = mv.__ivm_count__ OPERATOR(pg_catalog.+) diff.__ivm_count__ FROM pg_temp_3.pg_temp_73900 AS diff WHERE (mv.id OPERATOR(pg_catalog.=) diff.id OR (mv.id IS NULL AND diff.id IS NULL)) AND (mv.data OPERATOR(pg_catalog.=) diff.data OR (mv.data IS NULL AND diff.data IS NULL)) RETURNING mv.id, mv.data) INSERT INTO public.foo_imv SELECT * FROM pg_temp_3.pg_temp_73900 AS diff WHERE NOT EXISTS (SELECT 1 FROM updt AS mv WHERE (mv.id OPERATOR(pg_catalog.=) diff.id OR (mv.id IS NULL AND diff.id IS NULL)) AND (mv.data OPERATOR(pg_catalog.=) diff.data OR (mv.data IS NULL AND diff.data IS NULL)));
TABLE foo;
id | data
----+------
1 | 2/3
2 | 1/3
3 | 1/2
(3 rows)
TABLE foo_imv;
id | data
----+------
1 | 2/3
3 | 1/2
(2 rows)
DROP MATERIALIZED VIEW foo_imv;
DROP MATERIALIZED VIEW
INSERT INTO foo (id, data) VALUES (4,'2/3'),(5,'2/5'),(6,'3/6');
INSERT 0 3
TABLE foo;
id | data
----+------
1 | 2/3
2 | 1/3
3 | 1/2
4 | 2/3
5 | 2/5
6 | 1/2
(6 rows)
```
Best regards.
Hi,
I would like to implement Incremental View Maintenance (IVM) on PostgreSQL.
IVM is a technique to maintain materialized views which computes and applies
only the incremental changes to the materialized views rather than
recomputate the contents as the current REFRESH command does.
I had a presentation on our PoC implementation of IVM at PGConf.eu 2018 [1].
Our implementation uses row OIDs to compute deltas for materialized views.
The basic idea is that if we have information about which rows in base tables
are contributing to generate a certain row in a matview then we can identify
the affected rows when a base table is updated. This is based on an idea of
Dr. Masunaga [2] who is a member of our group and inspired from ID-based
approach[3].
In our implementation, the mapping of the row OIDs of the materialized view
and the base tables are stored in "OID map". When a base relation is modified,
AFTER trigger is executed and the delta is recorded in delta tables using
the transition table feature. The accual udpate of the matview is triggerd
by REFRESH command with INCREMENTALLY option.
However, we realize problems of our implementation. First, WITH OIDS will
be removed since PG12, so OIDs are no longer available. Besides this, it would
be hard to implement this since it needs many changes of executor nodes to
collect base tables's OIDs during execuing a query. Also, the cost of maintaining
OID map would be high.
For these reasons, we started to think to implement IVM without relying on OIDs
and made a bit more surveys.
We also looked at Kevin Grittner's discussion [4] on incremental matview
maintenance. In this discussion, Kevin proposed to use counting algorithm [5]
to handle projection views (using DISTNICT) properly. This algorithm need an
additional system column, count_t, in materialized views and delta tables of
base tables.
However, the discussion about IVM is now stoped, so we would like to restart and
progress this.
Through our PoC inplementation and surveys, I think we need to think at least
the followings for implementing IVM.
1. How to extract changes on base tables
I think there would be at least two approaches for it.
- Using transition table in AFTER triggers
- Extracting changes from WAL using logical decoding
In our PoC implementation, we used AFTER trigger and transition tables, but using
logical decoding might be better from the point of performance of base table
modification.
If we can represent a change of UPDATE on a base table as query-like rather than
OLD and NEW, it may be possible to update the materialized view directly instead
of performing delete & insert.
2. How to compute the delta to be applied to materialized views
Essentially, IVM is based on relational algebra. Theorically, changes on base
tables are represented as deltas on this, like "R <- R + dR", and the delta on
the materialized view is computed using base table deltas based on "change
propagation equations". For implementation, we have to derive the equation from
the view definition query (Query tree, or Plan tree?) and describe this as SQL
query to compulte delta to be applied to the materialized view.
There could be several operations for view definition: selection, projection,
join, aggregation, union, difference, intersection, etc. If we can prepare a
module for each operation, it makes IVM extensable, so we can start a simple
view definition, and then support more complex views.
3. How to identify rows to be modifed in materialized views
When applying the delta to the materialized view, we have to identify which row
in the matview is corresponding to a row in the delta. A naive method is matching
by using all columns in a tuple, but clearly this is unefficient. If thematerialized
view has unique index, we can use this. Maybe, we have to force materialized views
to have all primary key colums in their base tables. In our PoC implementation, we
used OID to identify rows, but this will be no longer available as said above.
4. When to maintain materialized views
There are two candidates of the timing of maintenance, immediate (eager) or deferred.
In eager maintenance, the materialized view is updated in the same transaction
where the base table is updated. In deferred maintenance, this is done after the
transaction is commited, for example, when view is accessed, as a response to user
request, etc.
In the previous discussion[4], it is planned to start from "eager" approach. In our PoC
implementaion, we used the other aproach, that is, using REFRESH command to perform IVM.
I am not sure which is better as a start point, but I begin to think that the eager
approach may be more simple since we don't have to maintain base table changes in other
past transactions.
In the eager maintenance approache, we have to consider a race condition where two
different transactions change base tables simultaneously as discussed in [4].
[1] https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/
[2] https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1 (Japanese only)
[3] https://dl.acm.org/citation.cfm?id=2750546
[4] https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com
[5] https://dl.acm.org/citation.cfm?id=170066
Regards,
--
Yugo Nagata <nagata@sraoss.co.jp>
On Thu, 16 Jan 2020 12:59:11 +0900 nuko yokohama <nuko.yokohama@gmail.com> wrote: > Aggregate operation of user-defined type cannot be specified > (commit e150d964df7e3aeb768e4bae35d15764f8abd284) > > A SELECT statement using the MIN() and MAX() functions can be executed on a > user-defined type column that implements the aggregate functions MIN () and > MAX (). > However, if the same SELECT statement is specified in the AS clause of > CREATE INCREMENTAL MATERIALIZED VIEW, the following error will occur. > > ``` > SELECT MIN(data) data_min, MAX(data) data_max FROM foo; > data_min | data_max > ----------+---------- > 1/3 | 2/3 > (1 row) > > CREATE INCREMENTAL MATERIALIZED VIEW foo_min_imv AS SELECT MIN(data) > data_min FROM foo; > psql:extension-agg.sql:14: ERROR: aggregate function min is not supported > CREATE INCREMENTAL MATERIALIZED VIEW foo_max_imv AS SELECT MAX(data) > data_max FROM foo; > psql:extension-agg.sql:15: ERROR: aggregate function max is not supported > ``` > > Does query including user-defined type aggregate operation not supported by > INCREMENTAL MATERIALIZED VIEW? The current implementation supports only built-in aggregate functions, so user-defined aggregates are not supported, although it is allowed before. This is because we can not know how user-defined aggregates behave and if it can work safely with IVM. Min/Max on your fraction type may work well, but it is possible that some user-defined aggregate functions named min or max behave in totally different way than we expected. In future, maybe it is possible support user-defined aggregates are supported by extending pg_aggregate and adding support functions for IVM, but there is not still a concrete plan for now. BTW, the following error message doesn't look good because built-in min is supported, so I will improve it. ERROR: aggregate function min is not supported Regards, Yugo Nagata > > An execution example is shown below. > > ``` > [ec2-user@ip-10-0-1-10 ivm]$ cat extension-agg.sql > -- > -- pg_fraction: https://github.com/nuko-yokohama/pg_fraction > -- > DROP EXTENSION IF EXISTS pg_fraction CASCADE; > DROP TABLE IF EXISTS foo CASCADE; > > CREATE EXTENSION IF NOT EXISTS pg_fraction; > \dx > \dT+ fraction > > CREATE TABLE foo (id int, data fraction); > INSERT INTO foo (id, data) VALUES (1,'2/3'),(2,'1/3'),(3,'1/2'); > SELECT MIN(data) data_min, MAX(data) data_max FROM foo; > CREATE INCREMENTAL MATERIALIZED VIEW foo_min_imv AS SELECT MIN(data) > data_min FROM foo; > CREATE INCREMENTAL MATERIALIZED VIEW foo_max_imv AS SELECT MAX(data) > data_max FROM foo; > > SELECT MIN(id) id_min, MAX(id) id_max FROM foo; > CREATE INCREMENTAL MATERIALIZED VIEW foo_id_imv AS SELECT MIN(id) id_min, > MAX(id) id_max FROM foo; > ``` > > Best regards. > > 2018年12月27日(木) 21:57 Yugo Nagata <nagata@sraoss.co.jp>: > > > Hi, > > > > I would like to implement Incremental View Maintenance (IVM) on > > PostgreSQL. > > IVM is a technique to maintain materialized views which computes and > > applies > > only the incremental changes to the materialized views rather than > > recomputate the contents as the current REFRESH command does. > > > > I had a presentation on our PoC implementation of IVM at PGConf.eu 2018 > > [1]. > > Our implementation uses row OIDs to compute deltas for materialized > > views. > > The basic idea is that if we have information about which rows in base > > tables > > are contributing to generate a certain row in a matview then we can > > identify > > the affected rows when a base table is updated. This is based on an idea of > > Dr. Masunaga [2] who is a member of our group and inspired from ID-based > > approach[3]. > > > > In our implementation, the mapping of the row OIDs of the materialized view > > and the base tables are stored in "OID map". When a base relation is > > modified, > > AFTER trigger is executed and the delta is recorded in delta tables using > > the transition table feature. The accual udpate of the matview is triggerd > > by REFRESH command with INCREMENTALLY option. > > > > However, we realize problems of our implementation. First, WITH OIDS will > > be removed since PG12, so OIDs are no longer available. Besides this, it > > would > > be hard to implement this since it needs many changes of executor nodes to > > collect base tables's OIDs during execuing a query. Also, the cost of > > maintaining > > OID map would be high. > > > > For these reasons, we started to think to implement IVM without relying on > > OIDs > > and made a bit more surveys. > > > > We also looked at Kevin Grittner's discussion [4] on incremental matview > > maintenance. In this discussion, Kevin proposed to use counting algorithm > > [5] > > to handle projection views (using DISTNICT) properly. This algorithm need > > an > > additional system column, count_t, in materialized views and delta tables > > of > > base tables. > > > > However, the discussion about IVM is now stoped, so we would like to > > restart and > > progress this. > > > > > > Through our PoC inplementation and surveys, I think we need to think at > > least > > the followings for implementing IVM. > > > > 1. How to extract changes on base tables > > > > I think there would be at least two approaches for it. > > > > - Using transition table in AFTER triggers > > - Extracting changes from WAL using logical decoding > > > > In our PoC implementation, we used AFTER trigger and transition tables, > > but using > > logical decoding might be better from the point of performance of base > > table > > modification. > > > > If we can represent a change of UPDATE on a base table as query-like > > rather than > > OLD and NEW, it may be possible to update the materialized view directly > > instead > > of performing delete & insert. > > > > > > 2. How to compute the delta to be applied to materialized views > > > > Essentially, IVM is based on relational algebra. Theorically, changes on > > base > > tables are represented as deltas on this, like "R <- R + dR", and the > > delta on > > the materialized view is computed using base table deltas based on "change > > propagation equations". For implementation, we have to derive the > > equation from > > the view definition query (Query tree, or Plan tree?) and describe this as > > SQL > > query to compulte delta to be applied to the materialized view. > > > > There could be several operations for view definition: selection, > > projection, > > join, aggregation, union, difference, intersection, etc. If we can > > prepare a > > module for each operation, it makes IVM extensable, so we can start a > > simple > > view definition, and then support more complex views. > > > > > > 3. How to identify rows to be modifed in materialized views > > > > When applying the delta to the materialized view, we have to identify > > which row > > in the matview is corresponding to a row in the delta. A naive method is > > matching > > by using all columns in a tuple, but clearly this is unefficient. If > > thematerialized > > view has unique index, we can use this. Maybe, we have to force > > materialized views > > to have all primary key colums in their base tables. In our PoC > > implementation, we > > used OID to identify rows, but this will be no longer available as said > > above. > > > > > > 4. When to maintain materialized views > > > > There are two candidates of the timing of maintenance, immediate (eager) > > or deferred. > > > > In eager maintenance, the materialized view is updated in the same > > transaction > > where the base table is updated. In deferred maintenance, this is done > > after the > > transaction is commited, for example, when view is accessed, as a response > > to user > > request, etc. > > > > In the previous discussion[4], it is planned to start from "eager" > > approach. In our PoC > > implementaion, we used the other aproach, that is, using REFRESH command > > to perform IVM. > > I am not sure which is better as a start point, but I begin to think that > > the eager > > approach may be more simple since we don't have to maintain base table > > changes in other > > past transactions. > > > > In the eager maintenance approache, we have to consider a race condition > > where two > > different transactions change base tables simultaneously as discussed in > > [4]. > > > > > > [1] > > https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/ > > [2] > > https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1 > > (Japanese only) > > [3] https://dl.acm.org/citation.cfm?id=2750546 > > [4] > > https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com > > [5] https://dl.acm.org/citation.cfm?id=170066 > > > > Regards, > > -- > > Yugo Nagata <nagata@sraoss.co.jp> > > > > -- Yugo NAGATA <nagata@sraoss.co.jp>
On Thu, 16 Jan 2020 18:50:40 +0900 nuko yokohama <nuko.yokohama@gmail.com> wrote: > Error occurs when updating user-defined type columns. > > Create an INCREMENTAL MATERIALIZED VIEW by specifying a query that includes > user-defined type columns. > After the view is created, an error occurs when inserting into the view > source table (including the user-defined type column). > ``` > ERROR: operator does not exist Thank you for your reporting. I think this error occurs because pg_catalog.= is used also for user-defined types. I will fix this. Regards, Yugo Nagata > ``` > > An execution example is shown below. > > ``` > [ec2-user@ip-10-0-1-10 ivm]$ psql testdb -a -f extension-insert.sql > -- > -- pg_fraction: https://github.com/nuko-yokohama/pg_fraction > -- > DROP EXTENSION IF EXISTS pg_fraction CASCADE; > psql:extension-insert.sql:4: NOTICE: drop cascades to column data of table > foo > DROP EXTENSION > DROP TABLE IF EXISTS foo CASCADE; > DROP TABLE > CREATE EXTENSION IF NOT EXISTS pg_fraction; > CREATE EXTENSION > \dx > List of installed extensions > Name | Version | Schema | Description > -------------+---------+------------+------------------------------ > pg_fraction | 1.0 | public | fraction data type > plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language > (2 rows) > > \dT+ fraction > List of data types > Schema | Name | Internal name | Size | Elements | Owner | Access > privileges | Description > --------+----------+---------------+------+----------+----------+-------------------+------------- > public | fraction | fraction | 16 | | postgres | > | > (1 row) > > CREATE TABLE foo (id int, data fraction); > CREATE TABLE > INSERT INTO foo (id, data) VALUES (1,'2/3'),(2,'1/3'),(3,'1/2'); > INSERT 0 3 > SELECT id, data FROM foo WHERE data >= '1/2'; > id | data > ----+------ > 1 | 2/3 > 3 | 1/2 > (2 rows) > > CREATE INCREMENTAL MATERIALIZED VIEW foo_imv AS SELECT id, data FROM foo > WHERE data >= '1/2'; > SELECT 2 > TABLE foo_imv; > id | data > ----+------ > 1 | 2/3 > 3 | 1/2 > (2 rows) > > INSERT INTO foo (id, data) VALUES (4,'2/3'),(5,'2/5'),(6,'3/6'); -- error > psql:extension-insert.sql:17: ERROR: operator does not exist: fraction > pg_catalog.= fraction > LINE 1: ...(mv.id IS NULL AND diff.id IS NULL)) AND (mv.data OPERATOR(p... > ^ > HINT: No operator matches the given name and argument types. You might > need to add explicit type casts. > QUERY: WITH updt AS (UPDATE public.foo_imv AS mv SET __ivm_count__ = > mv.__ivm_count__ OPERATOR(pg_catalog.+) diff.__ivm_count__ FROM > pg_temp_3.pg_temp_73900 AS diff WHERE (mv.id OPERATOR(pg_catalog.=) diff.id > OR (mv.id IS NULL AND diff.id IS NULL)) AND (mv.data OPERATOR(pg_catalog.=) > diff.data OR (mv.data IS NULL AND diff.data IS NULL)) RETURNING mv.id, > mv.data) INSERT INTO public.foo_imv SELECT * FROM pg_temp_3.pg_temp_73900 > AS diff WHERE NOT EXISTS (SELECT 1 FROM updt AS mv WHERE (mv.id > OPERATOR(pg_catalog.=) diff.id OR (mv.id IS NULL AND diff.id IS NULL)) AND > (mv.data OPERATOR(pg_catalog.=) diff.data OR (mv.data IS NULL AND diff.data > IS NULL))); > TABLE foo; > id | data > ----+------ > 1 | 2/3 > 2 | 1/3 > 3 | 1/2 > (3 rows) > > TABLE foo_imv; > id | data > ----+------ > 1 | 2/3 > 3 | 1/2 > (2 rows) > > DROP MATERIALIZED VIEW foo_imv; > DROP MATERIALIZED VIEW > INSERT INTO foo (id, data) VALUES (4,'2/3'),(5,'2/5'),(6,'3/6'); > INSERT 0 3 > TABLE foo; > id | data > ----+------ > 1 | 2/3 > 2 | 1/3 > 3 | 1/2 > 4 | 2/3 > 5 | 2/5 > 6 | 1/2 > (6 rows) > > ``` > > Best regards. > > 2018年12月27日(木) 21:57 Yugo Nagata <nagata@sraoss.co.jp>: > > > Hi, > > > > I would like to implement Incremental View Maintenance (IVM) on > > PostgreSQL. > > IVM is a technique to maintain materialized views which computes and > > applies > > only the incremental changes to the materialized views rather than > > recomputate the contents as the current REFRESH command does. > > > > I had a presentation on our PoC implementation of IVM at PGConf.eu 2018 > > [1]. > > Our implementation uses row OIDs to compute deltas for materialized > > views. > > The basic idea is that if we have information about which rows in base > > tables > > are contributing to generate a certain row in a matview then we can > > identify > > the affected rows when a base table is updated. This is based on an idea of > > Dr. Masunaga [2] who is a member of our group and inspired from ID-based > > approach[3]. > > > > In our implementation, the mapping of the row OIDs of the materialized view > > and the base tables are stored in "OID map". When a base relation is > > modified, > > AFTER trigger is executed and the delta is recorded in delta tables using > > the transition table feature. The accual udpate of the matview is triggerd > > by REFRESH command with INCREMENTALLY option. > > > > However, we realize problems of our implementation. First, WITH OIDS will > > be removed since PG12, so OIDs are no longer available. Besides this, it > > would > > be hard to implement this since it needs many changes of executor nodes to > > collect base tables's OIDs during execuing a query. Also, the cost of > > maintaining > > OID map would be high. > > > > For these reasons, we started to think to implement IVM without relying on > > OIDs > > and made a bit more surveys. > > > > We also looked at Kevin Grittner's discussion [4] on incremental matview > > maintenance. In this discussion, Kevin proposed to use counting algorithm > > [5] > > to handle projection views (using DISTNICT) properly. This algorithm need > > an > > additional system column, count_t, in materialized views and delta tables > > of > > base tables. > > > > However, the discussion about IVM is now stoped, so we would like to > > restart and > > progress this. > > > > > > Through our PoC inplementation and surveys, I think we need to think at > > least > > the followings for implementing IVM. > > > > 1. How to extract changes on base tables > > > > I think there would be at least two approaches for it. > > > > - Using transition table in AFTER triggers > > - Extracting changes from WAL using logical decoding > > > > In our PoC implementation, we used AFTER trigger and transition tables, > > but using > > logical decoding might be better from the point of performance of base > > table > > modification. > > > > If we can represent a change of UPDATE on a base table as query-like > > rather than > > OLD and NEW, it may be possible to update the materialized view directly > > instead > > of performing delete & insert. > > > > > > 2. How to compute the delta to be applied to materialized views > > > > Essentially, IVM is based on relational algebra. Theorically, changes on > > base > > tables are represented as deltas on this, like "R <- R + dR", and the > > delta on > > the materialized view is computed using base table deltas based on "change > > propagation equations". For implementation, we have to derive the > > equation from > > the view definition query (Query tree, or Plan tree?) and describe this as > > SQL > > query to compulte delta to be applied to the materialized view. > > > > There could be several operations for view definition: selection, > > projection, > > join, aggregation, union, difference, intersection, etc. If we can > > prepare a > > module for each operation, it makes IVM extensable, so we can start a > > simple > > view definition, and then support more complex views. > > > > > > 3. How to identify rows to be modifed in materialized views > > > > When applying the delta to the materialized view, we have to identify > > which row > > in the matview is corresponding to a row in the delta. A naive method is > > matching > > by using all columns in a tuple, but clearly this is unefficient. If > > thematerialized > > view has unique index, we can use this. Maybe, we have to force > > materialized views > > to have all primary key colums in their base tables. In our PoC > > implementation, we > > used OID to identify rows, but this will be no longer available as said > > above. > > > > > > 4. When to maintain materialized views > > > > There are two candidates of the timing of maintenance, immediate (eager) > > or deferred. > > > > In eager maintenance, the materialized view is updated in the same > > transaction > > where the base table is updated. In deferred maintenance, this is done > > after the > > transaction is commited, for example, when view is accessed, as a response > > to user > > request, etc. > > > > In the previous discussion[4], it is planned to start from "eager" > > approach. In our PoC > > implementaion, we used the other aproach, that is, using REFRESH command > > to perform IVM. > > I am not sure which is better as a start point, but I begin to think that > > the eager > > approach may be more simple since we don't have to maintain base table > > changes in other > > past transactions. > > > > In the eager maintenance approache, we have to consider a race condition > > where two > > different transactions change base tables simultaneously as discussed in > > [4]. > > > > > > [1] > > https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/ > > [2] > > https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1 > > (Japanese only) > > [3] https://dl.acm.org/citation.cfm?id=2750546 > > [4] > > https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com > > [5] https://dl.acm.org/citation.cfm?id=170066 > > > > Regards, > > -- > > Yugo Nagata <nagata@sraoss.co.jp> > > > > -- Yugo NAGATA <nagata@sraoss.co.jp>
Hello, It seems that patch v11 doesn't apply any more. Problem with "scanRTEForColumn" maybe because of change: https://git.postgresql.org/pg/commitdiff/b541e9accb28c90656388a3f827ca3a68dd2a308 Regards PAscal -- Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
I understand.
Even if the function name is min, there is a possibility that it is not an aggregation operation for finding the minimum value, so it is restricted.
I understood aggregation of user-defined types is a constraint.
Also, I agree with the error message improvements.
On Thu, 16 Jan 2020 12:59:11 +0900
nuko yokohama <nuko.yokohama@gmail.com> wrote:
> Aggregate operation of user-defined type cannot be specified
> (commit e150d964df7e3aeb768e4bae35d15764f8abd284)
>
> A SELECT statement using the MIN() and MAX() functions can be executed on a
> user-defined type column that implements the aggregate functions MIN () and
> MAX ().
> However, if the same SELECT statement is specified in the AS clause of
> CREATE INCREMENTAL MATERIALIZED VIEW, the following error will occur.
>
> ```
> SELECT MIN(data) data_min, MAX(data) data_max FROM foo;
> data_min | data_max
> ----------+----------
> 1/3 | 2/3
> (1 row)
>
> CREATE INCREMENTAL MATERIALIZED VIEW foo_min_imv AS SELECT MIN(data)
> data_min FROM foo;
> psql:extension-agg.sql:14: ERROR: aggregate function min is not supported
> CREATE INCREMENTAL MATERIALIZED VIEW foo_max_imv AS SELECT MAX(data)
> data_max FROM foo;
> psql:extension-agg.sql:15: ERROR: aggregate function max is not supported
> ```
>
> Does query including user-defined type aggregate operation not supported by
> INCREMENTAL MATERIALIZED VIEW?
The current implementation supports only built-in aggregate functions, so
user-defined aggregates are not supported, although it is allowed before.
This is because we can not know how user-defined aggregates behave and if
it can work safely with IVM. Min/Max on your fraction type may work well,
but it is possible that some user-defined aggregate functions named min
or max behave in totally different way than we expected.
In future, maybe it is possible support user-defined aggregates are supported
by extending pg_aggregate and adding support functions for IVM, but there is
not still a concrete plan for now.
BTW, the following error message doesn't look good because built-in min is
supported, so I will improve it.
ERROR: aggregate function min is not supported
Regards,
Yugo Nagata
>
> An execution example is shown below.
>
> ```
> [ec2-user@ip-10-0-1-10 ivm]$ cat extension-agg.sql
> --
> -- pg_fraction: https://github.com/nuko-yokohama/pg_fraction
> --
> DROP EXTENSION IF EXISTS pg_fraction CASCADE;
> DROP TABLE IF EXISTS foo CASCADE;
>
> CREATE EXTENSION IF NOT EXISTS pg_fraction;
> \dx
> \dT+ fraction
>
> CREATE TABLE foo (id int, data fraction);
> INSERT INTO foo (id, data) VALUES (1,'2/3'),(2,'1/3'),(3,'1/2');
> SELECT MIN(data) data_min, MAX(data) data_max FROM foo;
> CREATE INCREMENTAL MATERIALIZED VIEW foo_min_imv AS SELECT MIN(data)
> data_min FROM foo;
> CREATE INCREMENTAL MATERIALIZED VIEW foo_max_imv AS SELECT MAX(data)
> data_max FROM foo;
>
> SELECT MIN(id) id_min, MAX(id) id_max FROM foo;
> CREATE INCREMENTAL MATERIALIZED VIEW foo_id_imv AS SELECT MIN(id) id_min,
> MAX(id) id_max FROM foo;
> ```
>
> Best regards.
>
> 2018年12月27日(木) 21:57 Yugo Nagata <nagata@sraoss.co.jp>:
>
> > Hi,
> >
> > I would like to implement Incremental View Maintenance (IVM) on
> > PostgreSQL.
> > IVM is a technique to maintain materialized views which computes and
> > applies
> > only the incremental changes to the materialized views rather than
> > recomputate the contents as the current REFRESH command does.
> >
> > I had a presentation on our PoC implementation of IVM at PGConf.eu 2018
> > [1].
> > Our implementation uses row OIDs to compute deltas for materialized
> > views.
> > The basic idea is that if we have information about which rows in base
> > tables
> > are contributing to generate a certain row in a matview then we can
> > identify
> > the affected rows when a base table is updated. This is based on an idea of
> > Dr. Masunaga [2] who is a member of our group and inspired from ID-based
> > approach[3].
> >
> > In our implementation, the mapping of the row OIDs of the materialized view
> > and the base tables are stored in "OID map". When a base relation is
> > modified,
> > AFTER trigger is executed and the delta is recorded in delta tables using
> > the transition table feature. The accual udpate of the matview is triggerd
> > by REFRESH command with INCREMENTALLY option.
> >
> > However, we realize problems of our implementation. First, WITH OIDS will
> > be removed since PG12, so OIDs are no longer available. Besides this, it
> > would
> > be hard to implement this since it needs many changes of executor nodes to
> > collect base tables's OIDs during execuing a query. Also, the cost of
> > maintaining
> > OID map would be high.
> >
> > For these reasons, we started to think to implement IVM without relying on
> > OIDs
> > and made a bit more surveys.
> >
> > We also looked at Kevin Grittner's discussion [4] on incremental matview
> > maintenance. In this discussion, Kevin proposed to use counting algorithm
> > [5]
> > to handle projection views (using DISTNICT) properly. This algorithm need
> > an
> > additional system column, count_t, in materialized views and delta tables
> > of
> > base tables.
> >
> > However, the discussion about IVM is now stoped, so we would like to
> > restart and
> > progress this.
> >
> >
> > Through our PoC inplementation and surveys, I think we need to think at
> > least
> > the followings for implementing IVM.
> >
> > 1. How to extract changes on base tables
> >
> > I think there would be at least two approaches for it.
> >
> > - Using transition table in AFTER triggers
> > - Extracting changes from WAL using logical decoding
> >
> > In our PoC implementation, we used AFTER trigger and transition tables,
> > but using
> > logical decoding might be better from the point of performance of base
> > table
> > modification.
> >
> > If we can represent a change of UPDATE on a base table as query-like
> > rather than
> > OLD and NEW, it may be possible to update the materialized view directly
> > instead
> > of performing delete & insert.
> >
> >
> > 2. How to compute the delta to be applied to materialized views
> >
> > Essentially, IVM is based on relational algebra. Theorically, changes on
> > base
> > tables are represented as deltas on this, like "R <- R + dR", and the
> > delta on
> > the materialized view is computed using base table deltas based on "change
> > propagation equations". For implementation, we have to derive the
> > equation from
> > the view definition query (Query tree, or Plan tree?) and describe this as
> > SQL
> > query to compulte delta to be applied to the materialized view.
> >
> > There could be several operations for view definition: selection,
> > projection,
> > join, aggregation, union, difference, intersection, etc. If we can
> > prepare a
> > module for each operation, it makes IVM extensable, so we can start a
> > simple
> > view definition, and then support more complex views.
> >
> >
> > 3. How to identify rows to be modifed in materialized views
> >
> > When applying the delta to the materialized view, we have to identify
> > which row
> > in the matview is corresponding to a row in the delta. A naive method is
> > matching
> > by using all columns in a tuple, but clearly this is unefficient. If
> > thematerialized
> > view has unique index, we can use this. Maybe, we have to force
> > materialized views
> > to have all primary key colums in their base tables. In our PoC
> > implementation, we
> > used OID to identify rows, but this will be no longer available as said
> > above.
> >
> >
> > 4. When to maintain materialized views
> >
> > There are two candidates of the timing of maintenance, immediate (eager)
> > or deferred.
> >
> > In eager maintenance, the materialized view is updated in the same
> > transaction
> > where the base table is updated. In deferred maintenance, this is done
> > after the
> > transaction is commited, for example, when view is accessed, as a response
> > to user
> > request, etc.
> >
> > In the previous discussion[4], it is planned to start from "eager"
> > approach. In our PoC
> > implementaion, we used the other aproach, that is, using REFRESH command
> > to perform IVM.
> > I am not sure which is better as a start point, but I begin to think that
> > the eager
> > approach may be more simple since we don't have to maintain base table
> > changes in other
> > past transactions.
> >
> > In the eager maintenance approache, we have to consider a race condition
> > where two
> > different transactions change base tables simultaneously as discussed in
> > [4].
> >
> >
> > [1]
> > https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/
> > [2]
> > https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1
> > (Japanese only)
> > [3] https://dl.acm.org/citation.cfm?id=2750546
> > [4]
> > https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com
> > [5] https://dl.acm.org/citation.cfm?id=170066
> >
> > Regards,
> > --
> > Yugo Nagata <nagata@sraoss.co.jp>
> >
> >
--
Yugo NAGATA <nagata@sraoss.co.jp>
On Fri, 17 Jan 2020 14:10:32 -0700 (MST) legrand legrand <legrand_legrand@hotmail.com> wrote: > Hello, > > It seems that patch v11 doesn't apply any more. > Problem with "scanRTEForColumn" maybe because of change: Thank you for your reporting! We will fix this in the next update. Regards, Yugo Nagata > > https://git.postgresql.org/pg/commitdiff/b541e9accb28c90656388a3f827ca3a68dd2a308 > > Regards > PAscal > > > > -- > Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html > > -- Yugo NAGATA <nagata@sraoss.co.jp>
On Mon, 20 Jan 2020 16:57:58 +0900 Yugo NAGATA <nagata@sraoss.co.jp> wrote: > On Fri, 17 Jan 2020 14:10:32 -0700 (MST) > legrand legrand <legrand_legrand@hotmail.com> wrote: > > > Hello, > > > > It seems that patch v11 doesn't apply any more. > > Problem with "scanRTEForColumn" maybe because of change: > > Thank you for your reporting! We will fix this in the next update. Although I have been working conflict fix and merge latest master, it takes a little longer, because it has large impact than we thought. Please wait a little more. Regards Takuma Hoshiai > Regards, > Yugo Nagata > > > > > https://git.postgresql.org/pg/commitdiff/b541e9accb28c90656388a3f827ca3a68dd2a308 > > > > Regards > > PAscal > > > > > > > > -- > > Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html > > > > > > > -- > Yugo NAGATA <nagata@sraoss.co.jp> > > > -- Takuma Hoshiai <hoshiai@sraoss.co.jp>
Hi, Attached is the latest patch (v12) to add support for Incremental Materialized View Maintenance (IVM). It is possible to apply to current latest master branch. Differences from the previous patch (v11) include: * support executing REFRESH MATERIALIZED VIEW command with IVM. * support unscannable state by WITH NO DATA option. * add a check for LIMIT/OFFSET at creating an IMMV If REFRESH is executed for IMMV (incremental maintainable materialized view), its contents is re-calculated as same asusual materialized views (full REFRESH). Although IMMV is basically keeping up-to-date data, rounding errors can be accumulatedin aggregated value in some cases, for example, if the view contains sum/avg on float type columns. Running REFRESHcommand on IMMV will resolve this. Also, WITH NO DATA option allows to make IMMV unscannable. At that time, IVM triggersare dropped from IMMV because these become unneeded and useless. Also, we added new deptype option 'm' in pg_depend view for checking a trigger is for IVM. Please tell me, if add new deptypeoption is unacceptable. It is also possible to perform the check by referencing pg_depend and pg_trigger, pg_procview instead of adding a new deptype. We update IVM restrictions. LIMIT/OFFSET clause is not supported with iVM because it is not suitable for incremental changesto the materialized view. This issue is reported by nuko-san. https://www.postgresql.org/message-id/CAF3Gu1ZK-s9GQh=70n8+21rBL8+fKW4tV3Ce-xuFXMsNFPO+zQ@mail.gmail.com Best Regards, Takuma Hoshiai On Mon, 27 Jan 2020 09:19:05 +0900 Takuma Hoshiai <hoshiai@sraoss.co.jp> wrote: > On Mon, 20 Jan 2020 16:57:58 +0900 > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > On Fri, 17 Jan 2020 14:10:32 -0700 (MST) > > legrand legrand <legrand_legrand@hotmail.com> wrote: > > > > > Hello, > > > > > > It seems that patch v11 doesn't apply any more. > > > Problem with "scanRTEForColumn" maybe because of change: > > > > Thank you for your reporting! We will fix this in the next update. > > Although I have been working conflict fix and merge latest master, it > takes a little longer, because it has large impact than we thought. > > Please wait a little more. > > Regards > Takuma Hoshiai > > > > Regards, > > Yugo Nagata > > > > > > > > https://git.postgresql.org/pg/commitdiff/b541e9accb28c90656388a3f827ca3a68dd2a308 > > > > > > Regards > > > PAscal > > > > > > > > > > > > -- > > > Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html > > > > > > > > > > > > -- > > Yugo NAGATA <nagata@sraoss.co.jp> > > > > > > > > > -- > Takuma Hoshiai <hoshiai@sraoss.co.jp> > > > > -- Takuma Hoshiai <hoshiai@sraoss.co.jp>
Attachment
If ROW LEVEL SECURITY is set for the source table after creating the INCREMENTAL MATELIALIZED VIEW, the search results by that are not reflected.
After setting ROW LEVEL SECURITY (similar to normal MATERIALIZED VIEW), you need to execute REFRESH MATERILALIZED VIEW and reflect the result.
(Not limited to this, but in general cases where search results change by means other than DML)
I propose to add this note to the document (rules.sgml).
execute log.
```
[ec2-user@ip-10-0-1-10 rls]$ psql testdb -e -f rls.sql
CREATE USER user_a;
CREATE ROLE
CREATE TABLE test (id int, data text);
CREATE TABLE
GRANT ALL ON TABLE test TO user_a;
GRANT
GRANT ALL ON SCHEMA public TO user_a;
GRANT
SET ROLE user_a;
SET
INSERT INTO test VALUES (1,'A'),(2,'B'),(3,'C');
INSERT 0 3
SELECT * FROM test;
id | data
----+------
1 | A
2 | B
3 | C
(3 rows)
CREATE VIEW test_v AS SELECT * FROM test;
CREATE VIEW
CREATE MATERIALIZED VIEW test_mv AS SELECT * FROM test;
SELECT 3
CREATE INCREMENTAL MATERIALIZED VIEW test_imv AS SELECT * FROM test;
SELECT 3
SELECT * FROM test_v;
id | data
----+------
1 | A
2 | B
3 | C
(3 rows)
SELECT * FROM test_mv;
id | data
----+------
1 | A
2 | B
3 | C
(3 rows)
SELECT * FROM test_imv;
id | data
----+------
3 | C
1 | A
2 | B
(3 rows)
RESET ROLE;
RESET
CREATE POLICY test_AAA ON test FOR SELECT TO user_a USING (data = 'A');
CREATE POLICY
ALTER TABLE test ENABLE ROW LEVEL SECURITY ;
ALTER TABLE
SET ROLE user_a;
SET
SELECT * FROM test_v;
id | data
----+------
1 | A
(1 row)
SELECT * FROM test_mv;
id | data
----+------
1 | A
2 | B
3 | C
(3 rows)
SELECT * FROM test_imv;
id | data
----+------
3 | C
1 | A
2 | B
(3 rows)
REFRESH MATERIALIZED VIEW test_mv;
REFRESH MATERIALIZED VIEW
REFRESH MATERIALIZED VIEW test_imv;
REFRESH MATERIALIZED VIEW
SELECT * FROM test_mv;
id | data
----+------
1 | A
(1 row)
SELECT * FROM test_imv;
id | data
----+------
1 | A
(1 row)
RESET ROLE;
RESET
REVOKE ALL ON TABLE test FROM user_a;
REVOKE
REVOKE ALL ON TABLE test_v FROM user_a;
REVOKE
REVOKE ALL ON TABLE test_mv FROM user_a;
REVOKE
REVOKE ALL ON TABLE test_imv FROM user_a;
REVOKE
REVOKE ALL ON SCHEMA public FROM user_a;
REVOKE
DROP TABLE test CASCADE;
psql:rls.sql:40: NOTICE: drop cascades to 3 other objects
DETAIL: drop cascades to view test_v
drop cascades to materialized view test_mv
drop cascades to materialized view test_imv
DROP TABLE
DROP USER user_a;
DROP ROLE
[ec2-user@ip-10-0-1-10 rls]$
Hi,
I would like to implement Incremental View Maintenance (IVM) on PostgreSQL.
IVM is a technique to maintain materialized views which computes and applies
only the incremental changes to the materialized views rather than
recomputate the contents as the current REFRESH command does.
I had a presentation on our PoC implementation of IVM at PGConf.eu 2018 [1].
Our implementation uses row OIDs to compute deltas for materialized views.
The basic idea is that if we have information about which rows in base tables
are contributing to generate a certain row in a matview then we can identify
the affected rows when a base table is updated. This is based on an idea of
Dr. Masunaga [2] who is a member of our group and inspired from ID-based
approach[3].
In our implementation, the mapping of the row OIDs of the materialized view
and the base tables are stored in "OID map". When a base relation is modified,
AFTER trigger is executed and the delta is recorded in delta tables using
the transition table feature. The accual udpate of the matview is triggerd
by REFRESH command with INCREMENTALLY option.
However, we realize problems of our implementation. First, WITH OIDS will
be removed since PG12, so OIDs are no longer available. Besides this, it would
be hard to implement this since it needs many changes of executor nodes to
collect base tables's OIDs during execuing a query. Also, the cost of maintaining
OID map would be high.
For these reasons, we started to think to implement IVM without relying on OIDs
and made a bit more surveys.
We also looked at Kevin Grittner's discussion [4] on incremental matview
maintenance. In this discussion, Kevin proposed to use counting algorithm [5]
to handle projection views (using DISTNICT) properly. This algorithm need an
additional system column, count_t, in materialized views and delta tables of
base tables.
However, the discussion about IVM is now stoped, so we would like to restart and
progress this.
Through our PoC inplementation and surveys, I think we need to think at least
the followings for implementing IVM.
1. How to extract changes on base tables
I think there would be at least two approaches for it.
- Using transition table in AFTER triggers
- Extracting changes from WAL using logical decoding
In our PoC implementation, we used AFTER trigger and transition tables, but using
logical decoding might be better from the point of performance of base table
modification.
If we can represent a change of UPDATE on a base table as query-like rather than
OLD and NEW, it may be possible to update the materialized view directly instead
of performing delete & insert.
2. How to compute the delta to be applied to materialized views
Essentially, IVM is based on relational algebra. Theorically, changes on base
tables are represented as deltas on this, like "R <- R + dR", and the delta on
the materialized view is computed using base table deltas based on "change
propagation equations". For implementation, we have to derive the equation from
the view definition query (Query tree, or Plan tree?) and describe this as SQL
query to compulte delta to be applied to the materialized view.
There could be several operations for view definition: selection, projection,
join, aggregation, union, difference, intersection, etc. If we can prepare a
module for each operation, it makes IVM extensable, so we can start a simple
view definition, and then support more complex views.
3. How to identify rows to be modifed in materialized views
When applying the delta to the materialized view, we have to identify which row
in the matview is corresponding to a row in the delta. A naive method is matching
by using all columns in a tuple, but clearly this is unefficient. If thematerialized
view has unique index, we can use this. Maybe, we have to force materialized views
to have all primary key colums in their base tables. In our PoC implementation, we
used OID to identify rows, but this will be no longer available as said above.
4. When to maintain materialized views
There are two candidates of the timing of maintenance, immediate (eager) or deferred.
In eager maintenance, the materialized view is updated in the same transaction
where the base table is updated. In deferred maintenance, this is done after the
transaction is commited, for example, when view is accessed, as a response to user
request, etc.
In the previous discussion[4], it is planned to start from "eager" approach. In our PoC
implementaion, we used the other aproach, that is, using REFRESH command to perform IVM.
I am not sure which is better as a start point, but I begin to think that the eager
approach may be more simple since we don't have to maintain base table changes in other
past transactions.
In the eager maintenance approache, we have to consider a race condition where two
different transactions change base tables simultaneously as discussed in [4].
[1] https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/
[2] https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1 (Japanese only)
[3] https://dl.acm.org/citation.cfm?id=2750546
[4] https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com
[5] https://dl.acm.org/citation.cfm?id=170066
Regards,
--
Yugo Nagata <nagata@sraoss.co.jp>
On Tue, 4 Feb 2020 18:40:45 +0900 nuko yokohama <nuko.yokohama@gmail.com> wrote: > "ROW LEVEL SECURITY" and INCREMENTAL MATERIALIZED VIEW. > > Hi. > > If ROW LEVEL SECURITY is set for the source table after creating the > INCREMENTAL MATELIALIZED VIEW, the search results by that are not reflected. > After setting ROW LEVEL SECURITY (similar to normal MATERIALIZED VIEW), you > need to execute REFRESH MATERILALIZED VIEW and reflect the result. > (Not limited to this, but in general cases where search results change by > means other than DML) > > I propose to add this note to the document (rules.sgml). Thank you for your suggestion! We'll add some description about this to the documentation. Regards, Yugo Nagata > > execute log. > > ``` > [ec2-user@ip-10-0-1-10 rls]$ psql testdb -e -f rls.sql > CREATE USER user_a; > CREATE ROLE > CREATE TABLE test (id int, data text); > CREATE TABLE > GRANT ALL ON TABLE test TO user_a; > GRANT > GRANT ALL ON SCHEMA public TO user_a; > GRANT > SET ROLE user_a; > SET > INSERT INTO test VALUES (1,'A'),(2,'B'),(3,'C'); > INSERT 0 3 > SELECT * FROM test; > id | data > ----+------ > 1 | A > 2 | B > 3 | C > (3 rows) > > CREATE VIEW test_v AS SELECT * FROM test; > CREATE VIEW > CREATE MATERIALIZED VIEW test_mv AS SELECT * FROM test; > SELECT 3 > CREATE INCREMENTAL MATERIALIZED VIEW test_imv AS SELECT * FROM test; > SELECT 3 > SELECT * FROM test_v; > id | data > ----+------ > 1 | A > 2 | B > 3 | C > (3 rows) > > SELECT * FROM test_mv; > id | data > ----+------ > 1 | A > 2 | B > 3 | C > (3 rows) > > SELECT * FROM test_imv; > id | data > ----+------ > 3 | C > 1 | A > 2 | B > (3 rows) > > RESET ROLE; > RESET > CREATE POLICY test_AAA ON test FOR SELECT TO user_a USING (data = 'A'); > CREATE POLICY > ALTER TABLE test ENABLE ROW LEVEL SECURITY ; > ALTER TABLE > SET ROLE user_a; > SET > SELECT * FROM test_v; > id | data > ----+------ > 1 | A > (1 row) > > SELECT * FROM test_mv; > id | data > ----+------ > 1 | A > 2 | B > 3 | C > (3 rows) > > SELECT * FROM test_imv; > id | data > ----+------ > 3 | C > 1 | A > 2 | B > (3 rows) > > REFRESH MATERIALIZED VIEW test_mv; > REFRESH MATERIALIZED VIEW > REFRESH MATERIALIZED VIEW test_imv; > REFRESH MATERIALIZED VIEW > SELECT * FROM test_mv; > id | data > ----+------ > 1 | A > (1 row) > > SELECT * FROM test_imv; > id | data > ----+------ > 1 | A > (1 row) > > RESET ROLE; > RESET > REVOKE ALL ON TABLE test FROM user_a; > REVOKE > REVOKE ALL ON TABLE test_v FROM user_a; > REVOKE > REVOKE ALL ON TABLE test_mv FROM user_a; > REVOKE > REVOKE ALL ON TABLE test_imv FROM user_a; > REVOKE > REVOKE ALL ON SCHEMA public FROM user_a; > REVOKE > DROP TABLE test CASCADE; > psql:rls.sql:40: NOTICE: drop cascades to 3 other objects > DETAIL: drop cascades to view test_v > drop cascades to materialized view test_mv > drop cascades to materialized view test_imv > DROP TABLE > DROP USER user_a; > DROP ROLE > [ec2-user@ip-10-0-1-10 rls]$ > > ``` > > Regard. > > 2018年12月27日(木) 21:57 Yugo Nagata <nagata@sraoss.co.jp>: > > > Hi, > > > > I would like to implement Incremental View Maintenance (IVM) on > > PostgreSQL. > > IVM is a technique to maintain materialized views which computes and > > applies > > only the incremental changes to the materialized views rather than > > recomputate the contents as the current REFRESH command does. > > > > I had a presentation on our PoC implementation of IVM at PGConf.eu 2018 > > [1]. > > Our implementation uses row OIDs to compute deltas for materialized > > views. > > The basic idea is that if we have information about which rows in base > > tables > > are contributing to generate a certain row in a matview then we can > > identify > > the affected rows when a base table is updated. This is based on an idea of > > Dr. Masunaga [2] who is a member of our group and inspired from ID-based > > approach[3]. > > > > In our implementation, the mapping of the row OIDs of the materialized view > > and the base tables are stored in "OID map". When a base relation is > > modified, > > AFTER trigger is executed and the delta is recorded in delta tables using > > the transition table feature. The accual udpate of the matview is triggerd > > by REFRESH command with INCREMENTALLY option. > > > > However, we realize problems of our implementation. First, WITH OIDS will > > be removed since PG12, so OIDs are no longer available. Besides this, it > > would > > be hard to implement this since it needs many changes of executor nodes to > > collect base tables's OIDs during execuing a query. Also, the cost of > > maintaining > > OID map would be high. > > > > For these reasons, we started to think to implement IVM without relying on > > OIDs > > and made a bit more surveys. > > > > We also looked at Kevin Grittner's discussion [4] on incremental matview > > maintenance. In this discussion, Kevin proposed to use counting algorithm > > [5] > > to handle projection views (using DISTNICT) properly. This algorithm need > > an > > additional system column, count_t, in materialized views and delta tables > > of > > base tables. > > > > However, the discussion about IVM is now stoped, so we would like to > > restart and > > progress this. > > > > > > Through our PoC inplementation and surveys, I think we need to think at > > least > > the followings for implementing IVM. > > > > 1. How to extract changes on base tables > > > > I think there would be at least two approaches for it. > > > > - Using transition table in AFTER triggers > > - Extracting changes from WAL using logical decoding > > > > In our PoC implementation, we used AFTER trigger and transition tables, > > but using > > logical decoding might be better from the point of performance of base > > table > > modification. > > > > If we can represent a change of UPDATE on a base table as query-like > > rather than > > OLD and NEW, it may be possible to update the materialized view directly > > instead > > of performing delete & insert. > > > > > > 2. How to compute the delta to be applied to materialized views > > > > Essentially, IVM is based on relational algebra. Theorically, changes on > > base > > tables are represented as deltas on this, like "R <- R + dR", and the > > delta on > > the materialized view is computed using base table deltas based on "change > > propagation equations". For implementation, we have to derive the > > equation from > > the view definition query (Query tree, or Plan tree?) and describe this as > > SQL > > query to compulte delta to be applied to the materialized view. > > > > There could be several operations for view definition: selection, > > projection, > > join, aggregation, union, difference, intersection, etc. If we can > > prepare a > > module for each operation, it makes IVM extensable, so we can start a > > simple > > view definition, and then support more complex views. > > > > > > 3. How to identify rows to be modifed in materialized views > > > > When applying the delta to the materialized view, we have to identify > > which row > > in the matview is corresponding to a row in the delta. A naive method is > > matching > > by using all columns in a tuple, but clearly this is unefficient. If > > thematerialized > > view has unique index, we can use this. Maybe, we have to force > > materialized views > > to have all primary key colums in their base tables. In our PoC > > implementation, we > > used OID to identify rows, but this will be no longer available as said > > above. > > > > > > 4. When to maintain materialized views > > > > There are two candidates of the timing of maintenance, immediate (eager) > > or deferred. > > > > In eager maintenance, the materialized view is updated in the same > > transaction > > where the base table is updated. In deferred maintenance, this is done > > after the > > transaction is commited, for example, when view is accessed, as a response > > to user > > request, etc. > > > > In the previous discussion[4], it is planned to start from "eager" > > approach. In our PoC > > implementaion, we used the other aproach, that is, using REFRESH command > > to perform IVM. > > I am not sure which is better as a start point, but I begin to think that > > the eager > > approach may be more simple since we don't have to maintain base table > > changes in other > > past transactions. > > > > In the eager maintenance approache, we have to consider a race condition > > where two > > different transactions change base tables simultaneously as discussed in > > [4]. > > > > > > [1] > > https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/ > > [2] > > https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1 > > (Japanese only) > > [3] https://dl.acm.org/citation.cfm?id=2750546 > > [4] > > https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com > > [5] https://dl.acm.org/citation.cfm?id=170066 > > > > Regards, > > -- > > Yugo Nagata <nagata@sraoss.co.jp> > > > > -- Yugo NAGATA <nagata@sraoss.co.jp>
When creating an INCREMENTAL MATERIALIZED VIEW,
the server process crashes if you specify a query with a UNION.
(commit id = 23151be7be8d8f8f9c35c2d0e4e5353aedf2b31e)
execute log.
```
[ec2-user@ip-10-0-1-10 ivm]$ psql testdb -e -f union_query_crash.sql
DROP TABLE IF EXISTS table_x CASCADE;
psql:union_query_crash.sql:6: NOTICE: drop cascades to view xy_union_v
DROP TABLE
DROP TABLE IF EXISTS table_y CASCADE;
DROP TABLE
CREATE TABLE table_x (id int, data numeric);
CREATE TABLE
CREATE TABLE table_y (id int, data numeric);
CREATE TABLE
INSERT INTO table_x VALUES (generate_series(1, 3), random()::numeric);
INSERT 0 3
INSERT INTO table_y VALUES (generate_series(1, 3), random()::numeric);
INSERT 0 3
SELECT * FROM table_x;
id | data
----+--------------------
1 | 0.950724735058774
2 | 0.0222670808201144
3 | 0.391258547114841
(3 rows)
SELECT * FROM table_y;
id | data
----+--------------------
1 | 0.991717347778337
2 | 0.0528458947672874
3 | 0.965044982911163
(3 rows)
CREATE VIEW xy_union_v AS
SELECT 'table_x' AS name, * FROM table_x
UNION
SELECT 'table_y' AS name, * FROM table_y
;
CREATE VIEW
TABLE xy_union_v;
name | id | data
---------+----+--------------------
table_y | 2 | 0.0528458947672874
table_x | 2 | 0.0222670808201144
table_y | 3 | 0.965044982911163
table_x | 1 | 0.950724735058774
table_x | 3 | 0.391258547114841
table_y | 1 | 0.991717347778337
(6 rows)
CREATE INCREMENTAL MATERIALIZED VIEW xy_imv AS
SELECT 'table_x' AS name, * FROM table_x
UNION
SELECT 'table_y' AS name, * FROM table_y
;
psql:union_query_crash.sql:28: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
psql:union_query_crash.sql:28: fatal: connection to server was lost
```
UNION query problem.(server crash)
When creating an INCREMENTAL MATERIALIZED VIEW,
the server process crashes if you specify a query with a UNION.
(commit id = 23151be7be8d8f8f9c35c2d0e4e5353aedf2b31e)
execute log.
```
[ec2-user@ip-10-0-1-10 ivm]$ psql testdb -e -f union_query_crash.sql
DROP TABLE IF EXISTS table_x CASCADE;
psql:union_query_crash.sql:6: NOTICE: drop cascades to view xy_union_v
DROP TABLE
DROP TABLE IF EXISTS table_y CASCADE;
DROP TABLE
CREATE TABLE table_x (id int, data numeric);
CREATE TABLE
CREATE TABLE table_y (id int, data numeric);
CREATE TABLE
INSERT INTO table_x VALUES (generate_series(1, 3), random()::numeric);
INSERT 0 3
INSERT INTO table_y VALUES (generate_series(1, 3), random()::numeric);
INSERT 0 3
SELECT * FROM table_x;
id | data
----+--------------------
1 | 0.950724735058774
2 | 0.0222670808201144
3 | 0.391258547114841
(3 rows)
SELECT * FROM table_y;
id | data
----+--------------------
1 | 0.991717347778337
2 | 0.0528458947672874
3 | 0.965044982911163
(3 rows)
CREATE VIEW xy_union_v AS
SELECT 'table_x' AS name, * FROM table_x
UNION
SELECT 'table_y' AS name, * FROM table_y
;
CREATE VIEW
TABLE xy_union_v;
name | id | data
---------+----+--------------------
table_y | 2 | 0.0528458947672874
table_x | 2 | 0.0222670808201144
table_y | 3 | 0.965044982911163
table_x | 1 | 0.950724735058774
table_x | 3 | 0.391258547114841
table_y | 1 | 0.991717347778337
(6 rows)
CREATE INCREMENTAL MATERIALIZED VIEW xy_imv AS
SELECT 'table_x' AS name, * FROM table_x
UNION
SELECT 'table_y' AS name, * FROM table_y
;
psql:union_query_crash.sql:28: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
psql:union_query_crash.sql:28: fatal: connection to server was lost
```
Hi,
I would like to implement Incremental View Maintenance (IVM) on PostgreSQL.
IVM is a technique to maintain materialized views which computes and applies
only the incremental changes to the materialized views rather than
recomputate the contents as the current REFRESH command does.
I had a presentation on our PoC implementation of IVM at PGConf.eu 2018 [1].
Our implementation uses row OIDs to compute deltas for materialized views.
The basic idea is that if we have information about which rows in base tables
are contributing to generate a certain row in a matview then we can identify
the affected rows when a base table is updated. This is based on an idea of
Dr. Masunaga [2] who is a member of our group and inspired from ID-based
approach[3].
In our implementation, the mapping of the row OIDs of the materialized view
and the base tables are stored in "OID map". When a base relation is modified,
AFTER trigger is executed and the delta is recorded in delta tables using
the transition table feature. The accual udpate of the matview is triggerd
by REFRESH command with INCREMENTALLY option.
However, we realize problems of our implementation. First, WITH OIDS will
be removed since PG12, so OIDs are no longer available. Besides this, it would
be hard to implement this since it needs many changes of executor nodes to
collect base tables's OIDs during execuing a query. Also, the cost of maintaining
OID map would be high.
For these reasons, we started to think to implement IVM without relying on OIDs
and made a bit more surveys.
We also looked at Kevin Grittner's discussion [4] on incremental matview
maintenance. In this discussion, Kevin proposed to use counting algorithm [5]
to handle projection views (using DISTNICT) properly. This algorithm need an
additional system column, count_t, in materialized views and delta tables of
base tables.
However, the discussion about IVM is now stoped, so we would like to restart and
progress this.
Through our PoC inplementation and surveys, I think we need to think at least
the followings for implementing IVM.
1. How to extract changes on base tables
I think there would be at least two approaches for it.
- Using transition table in AFTER triggers
- Extracting changes from WAL using logical decoding
In our PoC implementation, we used AFTER trigger and transition tables, but using
logical decoding might be better from the point of performance of base table
modification.
If we can represent a change of UPDATE on a base table as query-like rather than
OLD and NEW, it may be possible to update the materialized view directly instead
of performing delete & insert.
2. How to compute the delta to be applied to materialized views
Essentially, IVM is based on relational algebra. Theorically, changes on base
tables are represented as deltas on this, like "R <- R + dR", and the delta on
the materialized view is computed using base table deltas based on "change
propagation equations". For implementation, we have to derive the equation from
the view definition query (Query tree, or Plan tree?) and describe this as SQL
query to compulte delta to be applied to the materialized view.
There could be several operations for view definition: selection, projection,
join, aggregation, union, difference, intersection, etc. If we can prepare a
module for each operation, it makes IVM extensable, so we can start a simple
view definition, and then support more complex views.
3. How to identify rows to be modifed in materialized views
When applying the delta to the materialized view, we have to identify which row
in the matview is corresponding to a row in the delta. A naive method is matching
by using all columns in a tuple, but clearly this is unefficient. If thematerialized
view has unique index, we can use this. Maybe, we have to force materialized views
to have all primary key colums in their base tables. In our PoC implementation, we
used OID to identify rows, but this will be no longer available as said above.
4. When to maintain materialized views
There are two candidates of the timing of maintenance, immediate (eager) or deferred.
In eager maintenance, the materialized view is updated in the same transaction
where the base table is updated. In deferred maintenance, this is done after the
transaction is commited, for example, when view is accessed, as a response to user
request, etc.
In the previous discussion[4], it is planned to start from "eager" approach. In our PoC
implementaion, we used the other aproach, that is, using REFRESH command to perform IVM.
I am not sure which is better as a start point, but I begin to think that the eager
approach may be more simple since we don't have to maintain base table changes in other
past transactions.
In the eager maintenance approache, we have to consider a race condition where two
different transactions change base tables simultaneously as discussed in [4].
[1] https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/
[2] https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1 (Japanese only)
[3] https://dl.acm.org/citation.cfm?id=2750546
[4] https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com
[5] https://dl.acm.org/citation.cfm?id=170066
Regards,
--
Yugo Nagata <nagata@sraoss.co.jp>
On Sat, 8 Feb 2020 11:15:45 +0900 nuko yokohama <nuko.yokohama@gmail.com> wrote: > Hi. > > UNION query problem.(server crash) > > When creating an INCREMENTAL MATERIALIZED VIEW, > the server process crashes if you specify a query with a UNION. Thank you for your report. As you noticed set operations including UNION is concurrently unsupported, although this is not checked at definition time and not documented either. Now we are thoroughly investigating unsupported queries, and will add checks and documentations for them. Regards, Yugo Nagata > > (commit id = 23151be7be8d8f8f9c35c2d0e4e5353aedf2b31e) > > execute log. > > ``` > [ec2-user@ip-10-0-1-10 ivm]$ psql testdb -e -f union_query_crash.sql > DROP TABLE IF EXISTS table_x CASCADE; > psql:union_query_crash.sql:6: NOTICE: drop cascades to view xy_union_v > DROP TABLE > DROP TABLE IF EXISTS table_y CASCADE; > DROP TABLE > CREATE TABLE table_x (id int, data numeric); > CREATE TABLE > CREATE TABLE table_y (id int, data numeric); > CREATE TABLE > INSERT INTO table_x VALUES (generate_series(1, 3), random()::numeric); > INSERT 0 3 > INSERT INTO table_y VALUES (generate_series(1, 3), random()::numeric); > INSERT 0 3 > SELECT * FROM table_x; > id | data > ----+-------------------- > 1 | 0.950724735058774 > 2 | 0.0222670808201144 > 3 | 0.391258547114841 > (3 rows) > > SELECT * FROM table_y; > id | data > ----+-------------------- > 1 | 0.991717347778337 > 2 | 0.0528458947672874 > 3 | 0.965044982911163 > (3 rows) > > CREATE VIEW xy_union_v AS > SELECT 'table_x' AS name, * FROM table_x > UNION > SELECT 'table_y' AS name, * FROM table_y > ; > CREATE VIEW > TABLE xy_union_v; > name | id | data > ---------+----+-------------------- > table_y | 2 | 0.0528458947672874 > table_x | 2 | 0.0222670808201144 > table_y | 3 | 0.965044982911163 > table_x | 1 | 0.950724735058774 > table_x | 3 | 0.391258547114841 > table_y | 1 | 0.991717347778337 > (6 rows) > > CREATE INCREMENTAL MATERIALIZED VIEW xy_imv AS > SELECT 'table_x' AS name, * FROM table_x > UNION > SELECT 'table_y' AS name, * FROM table_y > ; > psql:union_query_crash.sql:28: server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > psql:union_query_crash.sql:28: fatal: connection to server was lost > ``` > UNION query problem.(server crash) > > When creating an INCREMENTAL MATERIALIZED VIEW, > the server process crashes if you specify a query with a UNION. > > (commit id = 23151be7be8d8f8f9c35c2d0e4e5353aedf2b31e) > > execute log. > > ``` > [ec2-user@ip-10-0-1-10 ivm]$ psql testdb -e -f union_query_crash.sql > DROP TABLE IF EXISTS table_x CASCADE; > psql:union_query_crash.sql:6: NOTICE: drop cascades to view xy_union_v > DROP TABLE > DROP TABLE IF EXISTS table_y CASCADE; > DROP TABLE > CREATE TABLE table_x (id int, data numeric); > CREATE TABLE > CREATE TABLE table_y (id int, data numeric); > CREATE TABLE > INSERT INTO table_x VALUES (generate_series(1, 3), random()::numeric); > INSERT 0 3 > INSERT INTO table_y VALUES (generate_series(1, 3), random()::numeric); > INSERT 0 3 > SELECT * FROM table_x; > id | data > ----+-------------------- > 1 | 0.950724735058774 > 2 | 0.0222670808201144 > 3 | 0.391258547114841 > (3 rows) > > SELECT * FROM table_y; > id | data > ----+-------------------- > 1 | 0.991717347778337 > 2 | 0.0528458947672874 > 3 | 0.965044982911163 > (3 rows) > > CREATE VIEW xy_union_v AS > SELECT 'table_x' AS name, * FROM table_x > UNION > SELECT 'table_y' AS name, * FROM table_y > ; > CREATE VIEW > TABLE xy_union_v; > name | id | data > ---------+----+-------------------- > table_y | 2 | 0.0528458947672874 > table_x | 2 | 0.0222670808201144 > table_y | 3 | 0.965044982911163 > table_x | 1 | 0.950724735058774 > table_x | 3 | 0.391258547114841 > table_y | 1 | 0.991717347778337 > (6 rows) > > CREATE INCREMENTAL MATERIALIZED VIEW xy_imv AS > SELECT 'table_x' AS name, * FROM table_x > UNION > SELECT 'table_y' AS name, * FROM table_y > ; > psql:union_query_crash.sql:28: server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > psql:union_query_crash.sql:28: fatal: connection to server was lost > ``` > > 2018年12月27日(木) 21:57 Yugo Nagata <nagata@sraoss.co.jp>: > > > Hi, > > > > I would like to implement Incremental View Maintenance (IVM) on > > PostgreSQL. > > IVM is a technique to maintain materialized views which computes and > > applies > > only the incremental changes to the materialized views rather than > > recomputate the contents as the current REFRESH command does. > > > > I had a presentation on our PoC implementation of IVM at PGConf.eu 2018 > > [1]. > > Our implementation uses row OIDs to compute deltas for materialized > > views. > > The basic idea is that if we have information about which rows in base > > tables > > are contributing to generate a certain row in a matview then we can > > identify > > the affected rows when a base table is updated. This is based on an idea of > > Dr. Masunaga [2] who is a member of our group and inspired from ID-based > > approach[3]. > > > > In our implementation, the mapping of the row OIDs of the materialized view > > and the base tables are stored in "OID map". When a base relation is > > modified, > > AFTER trigger is executed and the delta is recorded in delta tables using > > the transition table feature. The accual udpate of the matview is triggerd > > by REFRESH command with INCREMENTALLY option. > > > > However, we realize problems of our implementation. First, WITH OIDS will > > be removed since PG12, so OIDs are no longer available. Besides this, it > > would > > be hard to implement this since it needs many changes of executor nodes to > > collect base tables's OIDs during execuing a query. Also, the cost of > > maintaining > > OID map would be high. > > > > For these reasons, we started to think to implement IVM without relying on > > OIDs > > and made a bit more surveys. > > > > We also looked at Kevin Grittner's discussion [4] on incremental matview > > maintenance. In this discussion, Kevin proposed to use counting algorithm > > [5] > > to handle projection views (using DISTNICT) properly. This algorithm need > > an > > additional system column, count_t, in materialized views and delta tables > > of > > base tables. > > > > However, the discussion about IVM is now stoped, so we would like to > > restart and > > progress this. > > > > > > Through our PoC inplementation and surveys, I think we need to think at > > least > > the followings for implementing IVM. > > > > 1. How to extract changes on base tables > > > > I think there would be at least two approaches for it. > > > > - Using transition table in AFTER triggers > > - Extracting changes from WAL using logical decoding > > > > In our PoC implementation, we used AFTER trigger and transition tables, > > but using > > logical decoding might be better from the point of performance of base > > table > > modification. > > > > If we can represent a change of UPDATE on a base table as query-like > > rather than > > OLD and NEW, it may be possible to update the materialized view directly > > instead > > of performing delete & insert. > > > > > > 2. How to compute the delta to be applied to materialized views > > > > Essentially, IVM is based on relational algebra. Theorically, changes on > > base > > tables are represented as deltas on this, like "R <- R + dR", and the > > delta on > > the materialized view is computed using base table deltas based on "change > > propagation equations". For implementation, we have to derive the > > equation from > > the view definition query (Query tree, or Plan tree?) and describe this as > > SQL > > query to compulte delta to be applied to the materialized view. > > > > There could be several operations for view definition: selection, > > projection, > > join, aggregation, union, difference, intersection, etc. If we can > > prepare a > > module for each operation, it makes IVM extensable, so we can start a > > simple > > view definition, and then support more complex views. > > > > > > 3. How to identify rows to be modifed in materialized views > > > > When applying the delta to the materialized view, we have to identify > > which row > > in the matview is corresponding to a row in the delta. A naive method is > > matching > > by using all columns in a tuple, but clearly this is unefficient. If > > thematerialized > > view has unique index, we can use this. Maybe, we have to force > > materialized views > > to have all primary key colums in their base tables. In our PoC > > implementation, we > > used OID to identify rows, but this will be no longer available as said > > above. > > > > > > 4. When to maintain materialized views > > > > There are two candidates of the timing of maintenance, immediate (eager) > > or deferred. > > > > In eager maintenance, the materialized view is updated in the same > > transaction > > where the base table is updated. In deferred maintenance, this is done > > after the > > transaction is commited, for example, when view is accessed, as a response > > to user > > request, etc. > > > > In the previous discussion[4], it is planned to start from "eager" > > approach. In our PoC > > implementaion, we used the other aproach, that is, using REFRESH command > > to perform IVM. > > I am not sure which is better as a start point, but I begin to think that > > the eager > > approach may be more simple since we don't have to maintain base table > > changes in other > > past transactions. > > > > In the eager maintenance approache, we have to consider a race condition > > where two > > different transactions change base tables simultaneously as discussed in > > [4]. > > > > > > [1] > > https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/ > > [2] > > https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1 > > (Japanese only) > > [3] https://dl.acm.org/citation.cfm?id=2750546 > > [4] > > https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com > > [5] https://dl.acm.org/citation.cfm?id=170066 > > > > Regards, > > -- > > Yugo Nagata <nagata@sraoss.co.jp> > > > > -- Yugo NAGATA <nagata@sraoss.co.jp>
I also refer to the implementation of "./src/backend/commands/createas.c" check_ivm_restriction_walker () to see if there are any other queries that may be problematic.
On Sat, 8 Feb 2020 11:15:45 +0900
nuko yokohama <nuko.yokohama@gmail.com> wrote:
> Hi.
>
> UNION query problem.(server crash)
>
> When creating an INCREMENTAL MATERIALIZED VIEW,
> the server process crashes if you specify a query with a UNION.
Thank you for your report. As you noticed set operations including
UNION is concurrently unsupported, although this is not checked at
definition time and not documented either. Now we are thoroughly
investigating unsupported queries, and will add checks and
documentations for them.
Regards,
Yugo Nagata
>
> (commit id = 23151be7be8d8f8f9c35c2d0e4e5353aedf2b31e)
>
> execute log.
>
> ```
> [ec2-user@ip-10-0-1-10 ivm]$ psql testdb -e -f union_query_crash.sql
> DROP TABLE IF EXISTS table_x CASCADE;
> psql:union_query_crash.sql:6: NOTICE: drop cascades to view xy_union_v
> DROP TABLE
> DROP TABLE IF EXISTS table_y CASCADE;
> DROP TABLE
> CREATE TABLE table_x (id int, data numeric);
> CREATE TABLE
> CREATE TABLE table_y (id int, data numeric);
> CREATE TABLE
> INSERT INTO table_x VALUES (generate_series(1, 3), random()::numeric);
> INSERT 0 3
> INSERT INTO table_y VALUES (generate_series(1, 3), random()::numeric);
> INSERT 0 3
> SELECT * FROM table_x;
> id | data
> ----+--------------------
> 1 | 0.950724735058774
> 2 | 0.0222670808201144
> 3 | 0.391258547114841
> (3 rows)
>
> SELECT * FROM table_y;
> id | data
> ----+--------------------
> 1 | 0.991717347778337
> 2 | 0.0528458947672874
> 3 | 0.965044982911163
> (3 rows)
>
> CREATE VIEW xy_union_v AS
> SELECT 'table_x' AS name, * FROM table_x
> UNION
> SELECT 'table_y' AS name, * FROM table_y
> ;
> CREATE VIEW
> TABLE xy_union_v;
> name | id | data
> ---------+----+--------------------
> table_y | 2 | 0.0528458947672874
> table_x | 2 | 0.0222670808201144
> table_y | 3 | 0.965044982911163
> table_x | 1 | 0.950724735058774
> table_x | 3 | 0.391258547114841
> table_y | 1 | 0.991717347778337
> (6 rows)
>
> CREATE INCREMENTAL MATERIALIZED VIEW xy_imv AS
> SELECT 'table_x' AS name, * FROM table_x
> UNION
> SELECT 'table_y' AS name, * FROM table_y
> ;
> psql:union_query_crash.sql:28: server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> psql:union_query_crash.sql:28: fatal: connection to server was lost
> ```
> UNION query problem.(server crash)
>
> When creating an INCREMENTAL MATERIALIZED VIEW,
> the server process crashes if you specify a query with a UNION.
>
> (commit id = 23151be7be8d8f8f9c35c2d0e4e5353aedf2b31e)
>
> execute log.
>
> ```
> [ec2-user@ip-10-0-1-10 ivm]$ psql testdb -e -f union_query_crash.sql
> DROP TABLE IF EXISTS table_x CASCADE;
> psql:union_query_crash.sql:6: NOTICE: drop cascades to view xy_union_v
> DROP TABLE
> DROP TABLE IF EXISTS table_y CASCADE;
> DROP TABLE
> CREATE TABLE table_x (id int, data numeric);
> CREATE TABLE
> CREATE TABLE table_y (id int, data numeric);
> CREATE TABLE
> INSERT INTO table_x VALUES (generate_series(1, 3), random()::numeric);
> INSERT 0 3
> INSERT INTO table_y VALUES (generate_series(1, 3), random()::numeric);
> INSERT 0 3
> SELECT * FROM table_x;
> id | data
> ----+--------------------
> 1 | 0.950724735058774
> 2 | 0.0222670808201144
> 3 | 0.391258547114841
> (3 rows)
>
> SELECT * FROM table_y;
> id | data
> ----+--------------------
> 1 | 0.991717347778337
> 2 | 0.0528458947672874
> 3 | 0.965044982911163
> (3 rows)
>
> CREATE VIEW xy_union_v AS
> SELECT 'table_x' AS name, * FROM table_x
> UNION
> SELECT 'table_y' AS name, * FROM table_y
> ;
> CREATE VIEW
> TABLE xy_union_v;
> name | id | data
> ---------+----+--------------------
> table_y | 2 | 0.0528458947672874
> table_x | 2 | 0.0222670808201144
> table_y | 3 | 0.965044982911163
> table_x | 1 | 0.950724735058774
> table_x | 3 | 0.391258547114841
> table_y | 1 | 0.991717347778337
> (6 rows)
>
> CREATE INCREMENTAL MATERIALIZED VIEW xy_imv AS
> SELECT 'table_x' AS name, * FROM table_x
> UNION
> SELECT 'table_y' AS name, * FROM table_y
> ;
> psql:union_query_crash.sql:28: server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> psql:union_query_crash.sql:28: fatal: connection to server was lost
> ```
>
> 2018年12月27日(木) 21:57 Yugo Nagata <nagata@sraoss.co.jp>:
>
> > Hi,
> >
> > I would like to implement Incremental View Maintenance (IVM) on
> > PostgreSQL.
> > IVM is a technique to maintain materialized views which computes and
> > applies
> > only the incremental changes to the materialized views rather than
> > recomputate the contents as the current REFRESH command does.
> >
> > I had a presentation on our PoC implementation of IVM at PGConf.eu 2018
> > [1].
> > Our implementation uses row OIDs to compute deltas for materialized
> > views.
> > The basic idea is that if we have information about which rows in base
> > tables
> > are contributing to generate a certain row in a matview then we can
> > identify
> > the affected rows when a base table is updated. This is based on an idea of
> > Dr. Masunaga [2] who is a member of our group and inspired from ID-based
> > approach[3].
> >
> > In our implementation, the mapping of the row OIDs of the materialized view
> > and the base tables are stored in "OID map". When a base relation is
> > modified,
> > AFTER trigger is executed and the delta is recorded in delta tables using
> > the transition table feature. The accual udpate of the matview is triggerd
> > by REFRESH command with INCREMENTALLY option.
> >
> > However, we realize problems of our implementation. First, WITH OIDS will
> > be removed since PG12, so OIDs are no longer available. Besides this, it
> > would
> > be hard to implement this since it needs many changes of executor nodes to
> > collect base tables's OIDs during execuing a query. Also, the cost of
> > maintaining
> > OID map would be high.
> >
> > For these reasons, we started to think to implement IVM without relying on
> > OIDs
> > and made a bit more surveys.
> >
> > We also looked at Kevin Grittner's discussion [4] on incremental matview
> > maintenance. In this discussion, Kevin proposed to use counting algorithm
> > [5]
> > to handle projection views (using DISTNICT) properly. This algorithm need
> > an
> > additional system column, count_t, in materialized views and delta tables
> > of
> > base tables.
> >
> > However, the discussion about IVM is now stoped, so we would like to
> > restart and
> > progress this.
> >
> >
> > Through our PoC inplementation and surveys, I think we need to think at
> > least
> > the followings for implementing IVM.
> >
> > 1. How to extract changes on base tables
> >
> > I think there would be at least two approaches for it.
> >
> > - Using transition table in AFTER triggers
> > - Extracting changes from WAL using logical decoding
> >
> > In our PoC implementation, we used AFTER trigger and transition tables,
> > but using
> > logical decoding might be better from the point of performance of base
> > table
> > modification.
> >
> > If we can represent a change of UPDATE on a base table as query-like
> > rather than
> > OLD and NEW, it may be possible to update the materialized view directly
> > instead
> > of performing delete & insert.
> >
> >
> > 2. How to compute the delta to be applied to materialized views
> >
> > Essentially, IVM is based on relational algebra. Theorically, changes on
> > base
> > tables are represented as deltas on this, like "R <- R + dR", and the
> > delta on
> > the materialized view is computed using base table deltas based on "change
> > propagation equations". For implementation, we have to derive the
> > equation from
> > the view definition query (Query tree, or Plan tree?) and describe this as
> > SQL
> > query to compulte delta to be applied to the materialized view.
> >
> > There could be several operations for view definition: selection,
> > projection,
> > join, aggregation, union, difference, intersection, etc. If we can
> > prepare a
> > module for each operation, it makes IVM extensable, so we can start a
> > simple
> > view definition, and then support more complex views.
> >
> >
> > 3. How to identify rows to be modifed in materialized views
> >
> > When applying the delta to the materialized view, we have to identify
> > which row
> > in the matview is corresponding to a row in the delta. A naive method is
> > matching
> > by using all columns in a tuple, but clearly this is unefficient. If
> > thematerialized
> > view has unique index, we can use this. Maybe, we have to force
> > materialized views
> > to have all primary key colums in their base tables. In our PoC
> > implementation, we
> > used OID to identify rows, but this will be no longer available as said
> > above.
> >
> >
> > 4. When to maintain materialized views
> >
> > There are two candidates of the timing of maintenance, immediate (eager)
> > or deferred.
> >
> > In eager maintenance, the materialized view is updated in the same
> > transaction
> > where the base table is updated. In deferred maintenance, this is done
> > after the
> > transaction is commited, for example, when view is accessed, as a response
> > to user
> > request, etc.
> >
> > In the previous discussion[4], it is planned to start from "eager"
> > approach. In our PoC
> > implementaion, we used the other aproach, that is, using REFRESH command
> > to perform IVM.
> > I am not sure which is better as a start point, but I begin to think that
> > the eager
> > approach may be more simple since we don't have to maintain base table
> > changes in other
> > past transactions.
> >
> > In the eager maintenance approache, we have to consider a race condition
> > where two
> > different transactions change base tables simultaneously as discussed in
> > [4].
> >
> >
> > [1]
> > https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/
> > [2]
> > https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1
> > (Japanese only)
> > [3] https://dl.acm.org/citation.cfm?id=2750546
> > [4]
> > https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com
> > [5] https://dl.acm.org/citation.cfm?id=170066
> >
> > Regards,
> > --
> > Yugo Nagata <nagata@sraoss.co.jp>
> >
> >
--
Yugo NAGATA <nagata@sraoss.co.jp>
Hi, Attached is the latest patch (v13) to add support for Incremental View Maintenance (IVM). Differences from the previous patch (v12) include: * Allow to maintain IMMVs containing user defined types Previously, IMMVs (Incrementally Maintainable Materialized Views) containing user defined types could not be maintained and an error was raised because such columns were compared using pg_calatog.= during tuple matching. To fix this, use the column type's default equality operator instead of forcing to use the built-in operator. Pointed out by nuko-san. https://www.postgresql.org/message-id/CAF3Gu1YL7HWF0Veor3t8sQD%2BJnvozHe6WdUw0YsMqJGFezVhpg%40mail.gmail.com * Improve an error message for unspoorted aggregate functions Currentlly only built-in aggregate functions are supported, so aggregates on user-defined types causes an error at view definition time. However, the message was unappropreate like: ERROR: aggregate function max is not supported even though built-in max is supported. Therefore, this is improved to include its argument types as following: ERROR: aggregate function min(xxx) is not supported HINT: IVM supports only built-in aggregate functions. Pointed out by nuko-san. https://www.postgresql.org/message-id/CAF3Gu1bP0eiv%3DCqV%3D%2BxATdcmLypjjudLz_wdJgnRNULpiX9GrA%40mail.gmail.com * Doc: fix description of support subquery IVM supports regular EXISTS clause not only correlated subqueries. Regards, Yugo Nagata On Fri, 20 Dec 2019 14:02:32 +0900 Yugo Nagata <nagata@sraoss.co.jp> wrote: > IVM is a way to make materialized views up-to-date in which only > incremental changes are computed and applied on views rather than > recomputing the contents from scratch as REFRESH MATERIALIZED VIEW > does. IVM can update materialized views more efficiently > than recomputation when only small part of the view need updates. > > There are two approaches with regard to timing of view maintenance: > immediate and deferred. In immediate maintenance, views are updated in > the same transaction where its base table is modified. In deferred > maintenance, views are updated after the transaction is committed, > for example, when the view is accessed, as a response to user command > like REFRESH, or periodically in background, and so on. > > This patch implements a kind of immediate maintenance, in which > materialized views are updated immediately in AFTER triggers when a > base table is modified. > > This supports views using: > - inner and outer joins including self-join > - some built-in aggregate functions (count, sum, agv, min, max) > - a part of subqueries > -- simple subqueries in FROM clause > -- EXISTS subqueries in WHERE clause > - DISTINCT and views with tuple duplicates Regareds, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
Takuma Hoshiai wrote > Hi, > > Attached is the latest patch (v12) to add support for Incremental > Materialized View Maintenance (IVM). > It is possible to apply to current latest master branch. > > Differences from the previous patch (v11) include: > * support executing REFRESH MATERIALIZED VIEW command with IVM. > * support unscannable state by WITH NO DATA option. > * add a check for LIMIT/OFFSET at creating an IMMV > > If REFRESH is executed for IMMV (incremental maintainable materialized > view), its contents is re-calculated as same as usual materialized views > (full REFRESH). Although IMMV is basically keeping up-to-date data, > rounding errors can be accumulated in aggregated value in some cases, for > example, if the view contains sum/avg on float type columns. Running > REFRESH command on IMMV will resolve this. Also, WITH NO DATA option > allows to make IMMV unscannable. At that time, IVM triggers are dropped > from IMMV because these become unneeded and useless. > > [...] Hello, regarding syntax REFRESH MATERIALIZED VIEW x WITH NO DATA I understand that triggers are removed from the source tables, transforming the INCREMENTAL MATERIALIZED VIEW into a(n unscannable) MATERIALIZED VIEW. postgres=# refresh materialized view imv with no data; REFRESH MATERIALIZED VIEW postgres=# select * from imv; ERROR: materialized view "imv" has not been populated HINT: Use the REFRESH MATERIALIZED VIEW command. This operation seems to me more of an ALTER command than a REFRESH ONE. Wouldn't the syntax ALTER MATERIALIZED VIEW [ IF EXISTS ] name SET WITH NO DATA or SET WITHOUT DATA be better ? Continuing into this direction, did you ever think about an other feature like: ALTER MATERIALIZED VIEW [ IF EXISTS ] name SET { NOINCREMENTAL } or even SET { NOINCREMENTAL | INCREMENTAL | INCREMENTAL CONCURRENTLY } that would permit to switch between those modes and would keep frozen data available in the materialized view during heavy operations on source tables ? Regards PAscal -- Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Hi PAscal, On Tue, 11 Feb 2020 15:04:12 -0700 (MST) legrand legrand <legrand_legrand@hotmail.com> wrote: > > regarding syntax REFRESH MATERIALIZED VIEW x WITH NO DATA > > I understand that triggers are removed from the source tables, transforming > the INCREMENTAL MATERIALIZED VIEW into a(n unscannable) MATERIALIZED VIEW. > > postgres=# refresh materialized view imv with no data; > REFRESH MATERIALIZED VIEW > postgres=# select * from imv; > ERROR: materialized view "imv" has not been populated > HINT: Use the REFRESH MATERIALIZED VIEW command. > > This operation seems to me more of an ALTER command than a REFRESH ONE. > > Wouldn't the syntax > ALTER MATERIALIZED VIEW [ IF EXISTS ] name > SET WITH NO DATA > or > SET WITHOUT DATA > be better ? We use "REFRESH ... WITH NO DATA" because there is already the syntax to make materialized views non-scannable. We are just following in this. https://www.postgresql.org/docs/12/sql-refreshmaterializedview.html > > Continuing into this direction, did you ever think about an other feature > like: > ALTER MATERIALIZED VIEW [ IF EXISTS ] name > SET { NOINCREMENTAL } > or even > SET { NOINCREMENTAL | INCREMENTAL | INCREMENTAL CONCURRENTLY } > > that would permit to switch between those modes and would keep frozen data > available in the materialized view during heavy operations on source tables > ? Thank you for your suggestion! I agree that the feature to switch between normal materialized view and incrementally maintainable view is useful. We will add this to our ToDo list. Regarding its syntax, I would not like to add new keyword like NONINCREMENTAL, so how about the following ALTER MATERIALIZED VIEW ... SET {WITH | WITHOUT} INCREMENTAL REFRESH although this is just a idea and we will need discussion on it. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Yugo Nagata wrote > Thank you for your suggestion! I agree that the feature to switch between > normal materialized view and incrementally maintainable view is useful. > We will add this to our ToDo list. Regarding its syntax, > I would not like to add new keyword like NONINCREMENTAL, so how about > the following > > ALTER MATERIALIZED VIEW ... SET {WITH | WITHOUT} INCREMENTAL REFRESH > > although this is just a idea and we will need discussion on it. Thanks I will follow that discussion on GitHub https://github.com/sraoss/pgsql-ivm/issues/79 Regards PAscal -- Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
SELECT statements with a TABLESAMPLE clause should be rejected.
Currently, CREATE INCREMENTAL MATERIALIZED VIEW allows SELECT statements with the TABLESAMPLE clause.
However, the result of this SELECT statement is undefined and should be rejected when specified in CREATE INCREMENTAL MATERIALIZED VIEW.
(similar to handling non-immutable functions)
Hi.UNION query problem.(server crash)
When creating an INCREMENTAL MATERIALIZED VIEW,
the server process crashes if you specify a query with a UNION.
(commit id = 23151be7be8d8f8f9c35c2d0e4e5353aedf2b31e)
execute log.
```
[ec2-user@ip-10-0-1-10 ivm]$ psql testdb -e -f union_query_crash.sql
DROP TABLE IF EXISTS table_x CASCADE;
psql:union_query_crash.sql:6: NOTICE: drop cascades to view xy_union_v
DROP TABLE
DROP TABLE IF EXISTS table_y CASCADE;
DROP TABLE
CREATE TABLE table_x (id int, data numeric);
CREATE TABLE
CREATE TABLE table_y (id int, data numeric);
CREATE TABLE
INSERT INTO table_x VALUES (generate_series(1, 3), random()::numeric);
INSERT 0 3
INSERT INTO table_y VALUES (generate_series(1, 3), random()::numeric);
INSERT 0 3
SELECT * FROM table_x;
id | data
----+--------------------
1 | 0.950724735058774
2 | 0.0222670808201144
3 | 0.391258547114841
(3 rows)
SELECT * FROM table_y;
id | data
----+--------------------
1 | 0.991717347778337
2 | 0.0528458947672874
3 | 0.965044982911163
(3 rows)
CREATE VIEW xy_union_v AS
SELECT 'table_x' AS name, * FROM table_x
UNION
SELECT 'table_y' AS name, * FROM table_y
;
CREATE VIEW
TABLE xy_union_v;
name | id | data
---------+----+--------------------
table_y | 2 | 0.0528458947672874
table_x | 2 | 0.0222670808201144
table_y | 3 | 0.965044982911163
table_x | 1 | 0.950724735058774
table_x | 3 | 0.391258547114841
table_y | 1 | 0.991717347778337
(6 rows)
CREATE INCREMENTAL MATERIALIZED VIEW xy_imv AS
SELECT 'table_x' AS name, * FROM table_x
UNION
SELECT 'table_y' AS name, * FROM table_y
;
psql:union_query_crash.sql:28: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
psql:union_query_crash.sql:28: fatal: connection to server was lost
```
UNION query problem.(server crash)
When creating an INCREMENTAL MATERIALIZED VIEW,
the server process crashes if you specify a query with a UNION.
(commit id = 23151be7be8d8f8f9c35c2d0e4e5353aedf2b31e)
execute log.
```
[ec2-user@ip-10-0-1-10 ivm]$ psql testdb -e -f union_query_crash.sql
DROP TABLE IF EXISTS table_x CASCADE;
psql:union_query_crash.sql:6: NOTICE: drop cascades to view xy_union_v
DROP TABLE
DROP TABLE IF EXISTS table_y CASCADE;
DROP TABLE
CREATE TABLE table_x (id int, data numeric);
CREATE TABLE
CREATE TABLE table_y (id int, data numeric);
CREATE TABLE
INSERT INTO table_x VALUES (generate_series(1, 3), random()::numeric);
INSERT 0 3
INSERT INTO table_y VALUES (generate_series(1, 3), random()::numeric);
INSERT 0 3
SELECT * FROM table_x;
id | data
----+--------------------
1 | 0.950724735058774
2 | 0.0222670808201144
3 | 0.391258547114841
(3 rows)
SELECT * FROM table_y;
id | data
----+--------------------
1 | 0.991717347778337
2 | 0.0528458947672874
3 | 0.965044982911163
(3 rows)
CREATE VIEW xy_union_v AS
SELECT 'table_x' AS name, * FROM table_x
UNION
SELECT 'table_y' AS name, * FROM table_y
;
CREATE VIEW
TABLE xy_union_v;
name | id | data
---------+----+--------------------
table_y | 2 | 0.0528458947672874
table_x | 2 | 0.0222670808201144
table_y | 3 | 0.965044982911163
table_x | 1 | 0.950724735058774
table_x | 3 | 0.391258547114841
table_y | 1 | 0.991717347778337
(6 rows)
CREATE INCREMENTAL MATERIALIZED VIEW xy_imv AS
SELECT 'table_x' AS name, * FROM table_x
UNION
SELECT 'table_y' AS name, * FROM table_y
;
psql:union_query_crash.sql:28: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
psql:union_query_crash.sql:28: fatal: connection to server was lost
```2018年12月27日(木) 21:57 Yugo Nagata <nagata@sraoss.co.jp>:Hi,
I would like to implement Incremental View Maintenance (IVM) on PostgreSQL.
IVM is a technique to maintain materialized views which computes and applies
only the incremental changes to the materialized views rather than
recomputate the contents as the current REFRESH command does.
I had a presentation on our PoC implementation of IVM at PGConf.eu 2018 [1].
Our implementation uses row OIDs to compute deltas for materialized views.
The basic idea is that if we have information about which rows in base tables
are contributing to generate a certain row in a matview then we can identify
the affected rows when a base table is updated. This is based on an idea of
Dr. Masunaga [2] who is a member of our group and inspired from ID-based
approach[3].
In our implementation, the mapping of the row OIDs of the materialized view
and the base tables are stored in "OID map". When a base relation is modified,
AFTER trigger is executed and the delta is recorded in delta tables using
the transition table feature. The accual udpate of the matview is triggerd
by REFRESH command with INCREMENTALLY option.
However, we realize problems of our implementation. First, WITH OIDS will
be removed since PG12, so OIDs are no longer available. Besides this, it would
be hard to implement this since it needs many changes of executor nodes to
collect base tables's OIDs during execuing a query. Also, the cost of maintaining
OID map would be high.
For these reasons, we started to think to implement IVM without relying on OIDs
and made a bit more surveys.
We also looked at Kevin Grittner's discussion [4] on incremental matview
maintenance. In this discussion, Kevin proposed to use counting algorithm [5]
to handle projection views (using DISTNICT) properly. This algorithm need an
additional system column, count_t, in materialized views and delta tables of
base tables.
However, the discussion about IVM is now stoped, so we would like to restart and
progress this.
Through our PoC inplementation and surveys, I think we need to think at least
the followings for implementing IVM.
1. How to extract changes on base tables
I think there would be at least two approaches for it.
- Using transition table in AFTER triggers
- Extracting changes from WAL using logical decoding
In our PoC implementation, we used AFTER trigger and transition tables, but using
logical decoding might be better from the point of performance of base table
modification.
If we can represent a change of UPDATE on a base table as query-like rather than
OLD and NEW, it may be possible to update the materialized view directly instead
of performing delete & insert.
2. How to compute the delta to be applied to materialized views
Essentially, IVM is based on relational algebra. Theorically, changes on base
tables are represented as deltas on this, like "R <- R + dR", and the delta on
the materialized view is computed using base table deltas based on "change
propagation equations". For implementation, we have to derive the equation from
the view definition query (Query tree, or Plan tree?) and describe this as SQL
query to compulte delta to be applied to the materialized view.
There could be several operations for view definition: selection, projection,
join, aggregation, union, difference, intersection, etc. If we can prepare a
module for each operation, it makes IVM extensable, so we can start a simple
view definition, and then support more complex views.
3. How to identify rows to be modifed in materialized views
When applying the delta to the materialized view, we have to identify which row
in the matview is corresponding to a row in the delta. A naive method is matching
by using all columns in a tuple, but clearly this is unefficient. If thematerialized
view has unique index, we can use this. Maybe, we have to force materialized views
to have all primary key colums in their base tables. In our PoC implementation, we
used OID to identify rows, but this will be no longer available as said above.
4. When to maintain materialized views
There are two candidates of the timing of maintenance, immediate (eager) or deferred.
In eager maintenance, the materialized view is updated in the same transaction
where the base table is updated. In deferred maintenance, this is done after the
transaction is commited, for example, when view is accessed, as a response to user
request, etc.
In the previous discussion[4], it is planned to start from "eager" approach. In our PoC
implementaion, we used the other aproach, that is, using REFRESH command to perform IVM.
I am not sure which is better as a start point, but I begin to think that the eager
approach may be more simple since we don't have to maintain base table changes in other
past transactions.
In the eager maintenance approache, we have to consider a race condition where two
different transactions change base tables simultaneously as discussed in [4].
[1] https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/
[2] https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1 (Japanese only)
[3] https://dl.acm.org/citation.cfm?id=2750546
[4] https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com
[5] https://dl.acm.org/citation.cfm?id=170066
Regards,
--
Yugo Nagata <nagata@sraoss.co.jp>
On Tue, 18 Feb 2020 22:03:47 +0900 nuko yokohama <nuko.yokohama@gmail.com> wrote: > Hi. > > SELECT statements with a TABLESAMPLE clause should be rejected. > > Currently, CREATE INCREMENTAL MATERIALIZED VIEW allows SELECT statements > with the TABLESAMPLE clause. > However, the result of this SELECT statement is undefined and should be > rejected when specified in CREATE INCREMENTAL MATERIALIZED VIEW. > (similar to handling non-immutable functions) Thanks! We totally agree with you. We are now working on improvement of query checks at creating IMMV. TABLESAMPLE will also be checked in this. Regards, Yugo Nagata > Regard. > > 2020年2月8日(土) 11:15 nuko yokohama <nuko.yokohama@gmail.com>: > > > Hi. > > > > UNION query problem.(server crash) > > > > When creating an INCREMENTAL MATERIALIZED VIEW, > > the server process crashes if you specify a query with a UNION. > > > > (commit id = 23151be7be8d8f8f9c35c2d0e4e5353aedf2b31e) > > > > execute log. > > > > ``` > > [ec2-user@ip-10-0-1-10 ivm]$ psql testdb -e -f union_query_crash.sql > > DROP TABLE IF EXISTS table_x CASCADE; > > psql:union_query_crash.sql:6: NOTICE: drop cascades to view xy_union_v > > DROP TABLE > > DROP TABLE IF EXISTS table_y CASCADE; > > DROP TABLE > > CREATE TABLE table_x (id int, data numeric); > > CREATE TABLE > > CREATE TABLE table_y (id int, data numeric); > > CREATE TABLE > > INSERT INTO table_x VALUES (generate_series(1, 3), random()::numeric); > > INSERT 0 3 > > INSERT INTO table_y VALUES (generate_series(1, 3), random()::numeric); > > INSERT 0 3 > > SELECT * FROM table_x; > > id | data > > ----+-------------------- > > 1 | 0.950724735058774 > > 2 | 0.0222670808201144 > > 3 | 0.391258547114841 > > (3 rows) > > > > SELECT * FROM table_y; > > id | data > > ----+-------------------- > > 1 | 0.991717347778337 > > 2 | 0.0528458947672874 > > 3 | 0.965044982911163 > > (3 rows) > > > > CREATE VIEW xy_union_v AS > > SELECT 'table_x' AS name, * FROM table_x > > UNION > > SELECT 'table_y' AS name, * FROM table_y > > ; > > CREATE VIEW > > TABLE xy_union_v; > > name | id | data > > ---------+----+-------------------- > > table_y | 2 | 0.0528458947672874 > > table_x | 2 | 0.0222670808201144 > > table_y | 3 | 0.965044982911163 > > table_x | 1 | 0.950724735058774 > > table_x | 3 | 0.391258547114841 > > table_y | 1 | 0.991717347778337 > > (6 rows) > > > > CREATE INCREMENTAL MATERIALIZED VIEW xy_imv AS > > SELECT 'table_x' AS name, * FROM table_x > > UNION > > SELECT 'table_y' AS name, * FROM table_y > > ; > > psql:union_query_crash.sql:28: server closed the connection unexpectedly > > This probably means the server terminated abnormally > > before or while processing the request. > > psql:union_query_crash.sql:28: fatal: connection to server was lost > > ``` > > UNION query problem.(server crash) > > > > When creating an INCREMENTAL MATERIALIZED VIEW, > > the server process crashes if you specify a query with a UNION. > > > > (commit id = 23151be7be8d8f8f9c35c2d0e4e5353aedf2b31e) > > > > execute log. > > > > ``` > > [ec2-user@ip-10-0-1-10 ivm]$ psql testdb -e -f union_query_crash.sql > > DROP TABLE IF EXISTS table_x CASCADE; > > psql:union_query_crash.sql:6: NOTICE: drop cascades to view xy_union_v > > DROP TABLE > > DROP TABLE IF EXISTS table_y CASCADE; > > DROP TABLE > > CREATE TABLE table_x (id int, data numeric); > > CREATE TABLE > > CREATE TABLE table_y (id int, data numeric); > > CREATE TABLE > > INSERT INTO table_x VALUES (generate_series(1, 3), random()::numeric); > > INSERT 0 3 > > INSERT INTO table_y VALUES (generate_series(1, 3), random()::numeric); > > INSERT 0 3 > > SELECT * FROM table_x; > > id | data > > ----+-------------------- > > 1 | 0.950724735058774 > > 2 | 0.0222670808201144 > > 3 | 0.391258547114841 > > (3 rows) > > > > SELECT * FROM table_y; > > id | data > > ----+-------------------- > > 1 | 0.991717347778337 > > 2 | 0.0528458947672874 > > 3 | 0.965044982911163 > > (3 rows) > > > > CREATE VIEW xy_union_v AS > > SELECT 'table_x' AS name, * FROM table_x > > UNION > > SELECT 'table_y' AS name, * FROM table_y > > ; > > CREATE VIEW > > TABLE xy_union_v; > > name | id | data > > ---------+----+-------------------- > > table_y | 2 | 0.0528458947672874 > > table_x | 2 | 0.0222670808201144 > > table_y | 3 | 0.965044982911163 > > table_x | 1 | 0.950724735058774 > > table_x | 3 | 0.391258547114841 > > table_y | 1 | 0.991717347778337 > > (6 rows) > > > > CREATE INCREMENTAL MATERIALIZED VIEW xy_imv AS > > SELECT 'table_x' AS name, * FROM table_x > > UNION > > SELECT 'table_y' AS name, * FROM table_y > > ; > > psql:union_query_crash.sql:28: server closed the connection unexpectedly > > This probably means the server terminated abnormally > > before or while processing the request. > > psql:union_query_crash.sql:28: fatal: connection to server was lost > > ``` > > > > 2018年12月27日(木) 21:57 Yugo Nagata <nagata@sraoss.co.jp>: > > > >> Hi, > >> > >> I would like to implement Incremental View Maintenance (IVM) on > >> PostgreSQL. > >> IVM is a technique to maintain materialized views which computes and > >> applies > >> only the incremental changes to the materialized views rather than > >> recomputate the contents as the current REFRESH command does. > >> > >> I had a presentation on our PoC implementation of IVM at PGConf.eu 2018 > >> [1]. > >> Our implementation uses row OIDs to compute deltas for materialized > >> views. > >> The basic idea is that if we have information about which rows in base > >> tables > >> are contributing to generate a certain row in a matview then we can > >> identify > >> the affected rows when a base table is updated. This is based on an idea > >> of > >> Dr. Masunaga [2] who is a member of our group and inspired from ID-based > >> approach[3]. > >> > >> In our implementation, the mapping of the row OIDs of the materialized > >> view > >> and the base tables are stored in "OID map". When a base relation is > >> modified, > >> AFTER trigger is executed and the delta is recorded in delta tables using > >> the transition table feature. The accual udpate of the matview is triggerd > >> by REFRESH command with INCREMENTALLY option. > >> > >> However, we realize problems of our implementation. First, WITH OIDS will > >> be removed since PG12, so OIDs are no longer available. Besides this, it > >> would > >> be hard to implement this since it needs many changes of executor nodes to > >> collect base tables's OIDs during execuing a query. Also, the cost of > >> maintaining > >> OID map would be high. > >> > >> For these reasons, we started to think to implement IVM without relying > >> on OIDs > >> and made a bit more surveys. > >> > >> We also looked at Kevin Grittner's discussion [4] on incremental matview > >> maintenance. In this discussion, Kevin proposed to use counting > >> algorithm [5] > >> to handle projection views (using DISTNICT) properly. This algorithm need > >> an > >> additional system column, count_t, in materialized views and delta tables > >> of > >> base tables. > >> > >> However, the discussion about IVM is now stoped, so we would like to > >> restart and > >> progress this. > >> > >> > >> Through our PoC inplementation and surveys, I think we need to think at > >> least > >> the followings for implementing IVM. > >> > >> 1. How to extract changes on base tables > >> > >> I think there would be at least two approaches for it. > >> > >> - Using transition table in AFTER triggers > >> - Extracting changes from WAL using logical decoding > >> > >> In our PoC implementation, we used AFTER trigger and transition tables, > >> but using > >> logical decoding might be better from the point of performance of base > >> table > >> modification. > >> > >> If we can represent a change of UPDATE on a base table as query-like > >> rather than > >> OLD and NEW, it may be possible to update the materialized view directly > >> instead > >> of performing delete & insert. > >> > >> > >> 2. How to compute the delta to be applied to materialized views > >> > >> Essentially, IVM is based on relational algebra. Theorically, changes on > >> base > >> tables are represented as deltas on this, like "R <- R + dR", and the > >> delta on > >> the materialized view is computed using base table deltas based on "change > >> propagation equations". For implementation, we have to derive the > >> equation from > >> the view definition query (Query tree, or Plan tree?) and describe this > >> as SQL > >> query to compulte delta to be applied to the materialized view. > >> > >> There could be several operations for view definition: selection, > >> projection, > >> join, aggregation, union, difference, intersection, etc. If we can > >> prepare a > >> module for each operation, it makes IVM extensable, so we can start a > >> simple > >> view definition, and then support more complex views. > >> > >> > >> 3. How to identify rows to be modifed in materialized views > >> > >> When applying the delta to the materialized view, we have to identify > >> which row > >> in the matview is corresponding to a row in the delta. A naive method is > >> matching > >> by using all columns in a tuple, but clearly this is unefficient. If > >> thematerialized > >> view has unique index, we can use this. Maybe, we have to force > >> materialized views > >> to have all primary key colums in their base tables. In our PoC > >> implementation, we > >> used OID to identify rows, but this will be no longer available as said > >> above. > >> > >> > >> 4. When to maintain materialized views > >> > >> There are two candidates of the timing of maintenance, immediate (eager) > >> or deferred. > >> > >> In eager maintenance, the materialized view is updated in the same > >> transaction > >> where the base table is updated. In deferred maintenance, this is done > >> after the > >> transaction is commited, for example, when view is accessed, as a > >> response to user > >> request, etc. > >> > >> In the previous discussion[4], it is planned to start from "eager" > >> approach. In our PoC > >> implementaion, we used the other aproach, that is, using REFRESH command > >> to perform IVM. > >> I am not sure which is better as a start point, but I begin to think that > >> the eager > >> approach may be more simple since we don't have to maintain base table > >> changes in other > >> past transactions. > >> > >> In the eager maintenance approache, we have to consider a race condition > >> where two > >> different transactions change base tables simultaneously as discussed in > >> [4]. > >> > >> > >> [1] > >> https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/ > >> [2] > >> https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1 > >> (Japanese only) > >> [3] https://dl.acm.org/citation.cfm?id=2750546 > >> [4] > >> https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com > >> [5] https://dl.acm.org/citation.cfm?id=170066 > >> > >> Regards, > >> -- > >> Yugo Nagata <nagata@sraoss.co.jp> > >> > >> -- Yugo NAGATA <nagata@sraoss.co.jp>
Hi, Attached is the latest patch (v14) to add support for Incremental Materialized View Maintenance (IVM). It is possible to apply to current latest master branch. Differences from the previous patch (v13) include: * Support base tables using RLS If a table has the Row Level Security (RLS) policy, IMMV is updated based on the view owner's policy when a base table is updated. However, when a policy of base table is changed or created after creating IMMV, IMMV is not updated based on the new RLS policy. In this case, REFRESH command must be executed. * Use ENR instead of temporary tables for internal operation Previously, IVM create and use a temporary tables to store view delta rows. However it caused out of shared memory, and Tom Lane pointed out that using temp tables in IVM trigger is not good. Currently, IVM uses tuplestores and ephemeral named relation (ENR) instead of temporary tables. it doesn't cause previous problem like below: testdb=# create table b1 (id integer, x numeric(10,3)); CREATE TABLE testdb=# create incremental materialized view mv1 testdb-# as select id, count(*),sum(x) from b1 group by id; SELECT 0 testdb=# testdb=# do $$ testdb$# declare testdb$# i integer; testdb$# begin testdb$# for i in 1..10000 testdb$# loop testdb$# insert into b1 values (1,1); testdb$# end loop; testdb$# end; testdb$# $$ testdb-# ; DO testdb=# This issue is reported by PAscal. https://www.postgresql.org/message-id/1577564109604-0.post@n3.nabble.com * Support pg_dump/pg_restore for IVM IVM supports pg_dump/pg_restore command. * Prohibit rename and unique index creation on IVM columns When a user make a unique index on ivm columns such as ivm_count, IVM will fail due to the unique constraint violation, so IVM prohibits it. Also, rename of these columns also causes IVM fails, so IVM prohibits it too. * Fix incorrect WHERE condition check for outer-join views The check for non null-rejecting condition check was incorrect. Best Regards, Takuma Hoshiai -- Takuma Hoshiai <hoshiai@sraoss.co.jp>
Attachment
> I have tried to use an other patch with yours: > "Planning counters in pg_stat_statements (using pgss_store)" > setting > shared_preload_libraries='pg_stat_statements' > pg_stat_statements.track=all > restarting the cluster and creating the extension > When trying following syntax: > create table b1 (id integer, x numeric(10,3)); > create incremental materialized view mv1 as select id, count(*),sum(x) > from b1 group by id; > insert into b1 values (1,1) > > I got an ASSERT FAILURE in pg_stat_statements.c > on > Assert(query != NULL); > > comming from matview.c > refresh_matview_datafill(dest_old, query, queryEnv, NULL); > or > refresh_matview_datafill(dest_new, query, queryEnv, NULL); > > If this (last) NULL field was replaced by the query text, > a comment or just "n/a", > it would fix the problem. > Could this be investigated ? Hello, thank you for patch v14, that fix problems inherited from temporary tables. it seems that this ASSERT problem with pgss patch is still present ;o( Could we have a look ? Thanks in advance Regards PAscal -- Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html -- Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
On Thu, 27 Feb 2020 14:35:55 -0700 (MST) legrand legrand <legrand_legrand@hotmail.com> wrote: > > I have tried to use an other patch with yours: > > "Planning counters in pg_stat_statements (using pgss_store)" > > > setting > > shared_preload_libraries='pg_stat_statements' > > pg_stat_statements.track=all > > restarting the cluster and creating the extension > > > > When trying following syntax: > > > create table b1 (id integer, x numeric(10,3)); > > create incremental materialized view mv1 as select id, count(*),sum(x) > > from b1 group by id; > > insert into b1 values (1,1) > > > > I got an ASSERT FAILURE in pg_stat_statements.c > > on > > Assert(query != NULL); > > > > comming from matview.c > > refresh_matview_datafill(dest_old, query, queryEnv, NULL); > > or > > refresh_matview_datafill(dest_new, query, queryEnv, NULL); > > > > If this (last) NULL field was replaced by the query text, > > a comment or just "n/a", > > it would fix the problem. > > > Could this be investigated ? > > Hello, > > thank you for patch v14, that fix problems inherited from temporary tables. > it seems that this ASSERT problem with pgss patch is still present ;o( > > Could we have a look ? Sorry but we are busy on fixing and improving IVM patches. I think fixing the assertion failure needs non trivial changes to other part of PosthreSQL. So we would like to work on the issue you reported after the pgss patch gets committed. Best Regards, Takuma Hoshiai > Thanks in advance > Regards > PAscal > > > > -- > Sent from: > https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html > > > > > > -- > Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html > > > -- Takuma Hoshiai <hoshiai@sraoss.co.jp>
>> thank you for patch v14, that fix problems inherited from temporary tables. >> it seems that this ASSERT problem with pgss patch is still present ;o( >> > > Sorry but we are busy on fixing and improving IVM patches. I think fixing > the assertion failure needs non trivial changes to other part of > PosthreSQL. > So we would like to work on the issue you reported after the pgss patch > gets committed. Imagine it will happen tomorrow ! You may say I'm a dreamer But I'm not the only one ... ... -- Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Hi, Attached is the latest patch (v15) to add support for Incremental Materialized View Maintenance (IVM). It is possible to apply to current latest master branch. Differences from the previous patch (v14) include: * Fix to not use generate_series when views are queried In the previous implementation, multiplicity of each tuple was stored in ivm_count column in views. When SELECT was issued for views with duplicate, the view was replaced with a subquery in which each tuple was joined with generate_series function in order to output tuples of the number of ivm_count. This was problematic for following reasons: - The overhead was huge. When almost of tuples in a view were selected, it took much longer time than the original query. This lost the meaning of materialized views. - Optimizer could not estimate row numbers correctly because this had to know ivm_count values stored in tuples. - System columns of materialized views like cmin, xmin, xmax could not be used because a view was replaced with a subquery. To resolve this, the new implementation doen't store multiplicities for views with tuple duplicates, and doesn't use generate_series when SELECT query is issued for such views. Note that we still have to use ivm_count for supporting DISTINCT and aggregates. * Add query checks for IVM restrictions Query checks for following restrictions are added: - DISTINCT ON - TABLESAMPLE parameter - inheritance parent table - window function - some aggregate options(such as FILTER, DISTINCT, ORDER and GROUPING SETS) - targetlist containing IVM column - simple subquery is only supported - FOR UPDATE/SHARE - empty target list - UNION/INTERSECT/EXCEPT - GROUPING SETS clauses * Improve error messages Add error code ERRCODE_FEATURE_NOT_SUPPORTED to each IVM error message. Also, the message format was unified. * Support subqueries containig joins in FROM clause Previously, when multi tables are updated simultaneously, incremental view maintenance with subqueries including JOIN didn't work correctly due to a bug. Best Regards, Takuma Hoshiai -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
On Fri, 10 Apr 2020 23:26:58 +0900 Yugo NAGATA <nagata@sraoss.co.jp> wrote: > Hi, > > Attached is the latest patch (v15) to add support for Incremental Materialized > View Maintenance (IVM). It is possible to apply to current latest master branch. I found a mistake of splitting patch, so I attached the fixed patch (v15a). > Differences from the previous patch (v14) include: > > * Fix to not use generate_series when views are queried > > In the previous implementation, multiplicity of each tuple was stored > in ivm_count column in views. When SELECT was issued for views with > duplicate, the view was replaced with a subquery in which each tuple > was joined with generate_series function in order to output tuples > of the number of ivm_count. > > This was problematic for following reasons: > > - The overhead was huge. When almost of tuples in a view were selected, > it took much longer time than the original query. This lost the meaning > of materialized views. > > - Optimizer could not estimate row numbers correctly because this had to > know ivm_count values stored in tuples. > > - System columns of materialized views like cmin, xmin, xmax could not > be used because a view was replaced with a subquery. > > To resolve this, the new implementation doen't store multiplicities > for views with tuple duplicates, and doesn't use generate_series > when SELECT query is issued for such views. > > Note that we still have to use ivm_count for supporting DISTINCT and > aggregates. I also explain the way of updating views with tuple duplicates. Although a view itself doesn't have ivm_count column, multiplicities for old delta and new delta are calculated and the count value is contained in a column named __ivm_count__ in each delta table. The old delta table is applied using ctid and row_number function. row_number is used to numbering tuples in the view, and tuples whose number is equal or is less than __ivm_count__ are deleted from the view using a query like: DELETE FROM matviewname WHERE ctid IN ( SELECT tid FROM ( SELECT row_number() over (partition by c1, c2, ...) AS __ivm_row_number__, mv.ctid AS tid, diff.__ivm_count__ FROM matviewname AS mv, old_delta AS diff " WHERE mv.c1 = diff.c1 AND mv.c2 = diff.c2 AND ... ) v WHERE v.__ivm_row_number__ <= v.__ivm_count__ The new delta is applied using generate_seriese to insert mutiple same tuples, using a query like: INSERT INTO matviewname (c1, c2, ...) SELECT c1,c2,... FROM ( SELECT diff.*, generate_series( > > * Add query checks for IVM restrictions > > Query checks for following restrictions are added: > > - DISTINCT ON > - TABLESAMPLE parameter > - inheritance parent table > - window function > - some aggregate options(such as FILTER, DISTINCT, ORDER and GROUPING SETS) > - targetlist containing IVM column > - simple subquery is only supported > - FOR UPDATE/SHARE > - empty target list > - UNION/INTERSECT/EXCEPT > - GROUPING SETS clauses > > * Improve error messages > > Add error code ERRCODE_FEATURE_NOT_SUPPORTED to each IVM error message. > Also, the message format was unified. > > * Support subqueries containig joins in FROM clause > > Previously, when multi tables are updated simultaneously, incremental > view maintenance with subqueries including JOIN didn't work correctly > due to a bug. > > Best Regards, > Takuma Hoshiai > > -- > Yugo NAGATA <nagata@sraoss.co.jp> -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
>> Hi, >> >> Attached is the latest patch (v15) to add support for Incremental Materialized >> View Maintenance (IVM). It is possible to apply to current latest master branch. I have tried to use IVM against TPC-DS (http://www.tpc.org/tpcds/) queries. TPC-DS models decision support systems and those queries are modestly complex. So I thought applying IVM to those queries could show how IVM covers real world queries. Since IVM does not support queries including ORDER BY and LIMIT, I removed them from the queries before the test. Here are some facts so far learned in this attempt. - Number of TPC-DS query files is 99. - IVM was successfully applied to 20 queries. - 33 queries failed because they use WITH clause (CTE) (currenly IVM does not support CTE). - Error messages from failed queries (except those using WITH) are below: (the number indicates how many queries failed by the same reason) 11 aggregate functions in nested query are not supported on incrementally maintainable materialized view 8 window functions are not supported on incrementally maintainable materialized view 7 UNION/INTERSECT/EXCEPT statements are not supported on incrementally maintainable materialized view 5 WHERE clause only support subquery with EXISTS clause 3 GROUPING SETS, ROLLUP, or CUBE clauses is not supported on incrementally maintainable materialized view 3 aggregate function and EXISTS condition are not supported at the same time 2 GROUP BY expression not appeared in select list is not supported on incrementally maintainable materialized view 2 aggregate function with DISTINCT arguments is not supported on incrementally maintainable materialized view 2 aggregate is not supported with outer join 1 aggregate function stddev_samp(integer) is not supported on incrementally maintainable materialized view 1 HAVING clause is not supported on incrementally maintainable materialized view 1 subquery is not supported with outer join 1 column "avg" specified more than once Attached are the queries IVM are successfully applied. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Attachment
>> Hi,
>>
>> Attached is the latest patch (v15) to add support for Incremental Materialized
>> View Maintenance (IVM). It is possible to apply to current latest master branch.
I have tried to use IVM against TPC-DS (http://www.tpc.org/tpcds/)
queries. TPC-DS models decision support systems and those queries are
modestly complex. So I thought applying IVM to those queries could
show how IVM covers real world queries.
>> +1, This is a smart idea. How did you test it? AFAIK, we can test it > with: > > 1. For any query like SELECT xxx, we create view like CREATE MATERIAL VIEW > mv_name as SELECT xxx; to test if the features in the query are supported. No I didn't test the correctness of IVM with TPC-DS data for now. TPC-DS comes with a data generator and we can test IVM something like: SELECT * FROM IVM_vew EXCEPT SELECT ... (TPC-DS original query); If this produces 0 row, then the IVM is correct for the initial data. (of course actually we need to add appropreate ORDER BY and LIMIT clause to the SELECT statement for IVM if neccessary). > 2. Update the data and then compare the result with SELECT XXX with SELECT > * from mv_name to test if the data is correctly sync. I wanted to test the data updating but I am still struggling how to extract correct updating data from TPC-DS data set. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Query checks for following restrictions are added:
- inheritance parent table
...
- targetlist containing IVM column
- simple subquery is only supported
>> Query checks for following restrictions are added: > > > Are all known supported cases listed below? They are "restrictions" and are not supported. > >> - inheritance parent table >> ... >> - targetlist containing IVM column >> - simple subquery is only supported >> > > How to understand 3 items above? The best way to understand them is looking into regression test. src/test/regress/expected/incremental_matview.out. >> - inheritance parent table -- inheritance parent is not supported with IVM" BEGIN; CREATE TABLE parent (i int, v int); CREATE TABLE child_a(options text) INHERITS(parent); CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm21 AS SELECT * FROM parent; ERROR: inheritance parent is not supported on incrementally maintainable materialized view >> - targetlist containing IVM column -- tartget list cannot contain ivm clumn that start with '__ivm' CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm28 AS SELECT i AS "__ivm_count__" FROM mv_base_a; ERROR: column name __ivm_count__ is not supported on incrementally maintainable materialized view >> - simple subquery is only supported -- subquery is not supported with outer join CREATE INCREMENTAL MATERIALIZED VIEW mv(a,b) AS SELECT a.i, b.i FROM mv_base_a a LEFT JOIN (SELECT * FROM mv_base_b) b ONa.i=b.i; ERROR: this query is not allowed on incrementally maintainable materialized view HINT: subquery is not supported with outer join Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
>> Query checks for following restrictions are added:
>
>
> Are all known supported cases listed below?
They are "restrictions" and are not supported.
>
>> - inheritance parent table
>> ...
>> - targetlist containing IVM column
>> - simple subquery is only supported
>>
>
> How to understand 3 items above?
The best way to understand them is looking into regression test.
Hi, Attached is the rebased patch (v16) to add support for Incremental Materialized View Maintenance (IVM). It is able to be applied to current latest master branch. This also includes the following small fixes: - Add a query check for expressions containing aggregates in it - [doc] Add description about which situations IVM is effective or not in - Improve hint in log messages - Reorganize include directives in codes Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
I have looked into this. > Hi, > > Attached is the rebased patch (v16) to add support for Incremental > Materialized View Maintenance (IVM). It is able to be applied to > current latest master branch. > > This also includes the following small fixes: > > - Add a query check for expressions containing aggregates in it > - [doc] Add description about which situations IVM is effective or not in > - Improve hint in log messages > - Reorganize include directives in codes - make check passed. - make check-world passed. - 0004-Allow-to-prolong-life-span-of-transition-tables-unti.patch: This one needs a comment to describe what the function does etc. +void +SetTransitionTablePreserved(Oid relid, CmdType cmdType) +{ - 0007-Add-aggregates-support-in-IVM.patch "Check if the given aggregate function is supporting" shouldn't be "Check if the given aggregate function is supporting IVM"? + * check_aggregate_supports_ivm + * + * Check if the given aggregate function is supporting Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On Wed, 19 Aug 2020 10:02:42 +0900 (JST) Tatsuo Ishii <ishii@sraoss.co.jp> wrote: > I have looked into this. Thank you for your reviewing! > - 0004-Allow-to-prolong-life-span-of-transition-tables-unti.patch: > This one needs a comment to describe what the function does etc. > > +void > +SetTransitionTablePreserved(Oid relid, CmdType cmdType) > +{ I added a comment for this function and related places. +/* + * SetTransitionTablePreserved + * + * Prolong lifespan of transition tables corresponding specified relid and + * command type to the end of the outmost query instead of each nested query. + * This enables to use nested AFTER trigger's transition tables from outer + * query's triggers. Currently, only immediate incremental view maintenance + * uses this. + */ +void +SetTransitionTablePreserved(Oid relid, CmdType cmdType) Also, I removed releted unnecessary code which was left accidentally. > - 0007-Add-aggregates-support-in-IVM.patch > "Check if the given aggregate function is supporting" shouldn't be > "Check if the given aggregate function is supporting IVM"? Yes, you are right. I fixed this, too. > > + * check_aggregate_supports_ivm > + * > + * Check if the given aggregate function is supporting Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
From: Yugo NAGATA <nagata@sraoss.co.jp> Subject: Re: Implementing Incremental View Maintenance Date: Fri, 21 Aug 2020 17:23:20 +0900 Message-ID: <20200821172320.a2506577d5244b6066f69331@sraoss.co.jp> > On Wed, 19 Aug 2020 10:02:42 +0900 (JST) > Tatsuo Ishii <ishii@sraoss.co.jp> wrote: > >> I have looked into this. > > Thank you for your reviewing! > >> - 0004-Allow-to-prolong-life-span-of-transition-tables-unti.patch: >> This one needs a comment to describe what the function does etc. >> >> +void >> +SetTransitionTablePreserved(Oid relid, CmdType cmdType) >> +{ > > I added a comment for this function and related places. > > +/* > + * SetTransitionTablePreserved > + * > + * Prolong lifespan of transition tables corresponding specified relid and > + * command type to the end of the outmost query instead of each nested query. > + * This enables to use nested AFTER trigger's transition tables from outer > + * query's triggers. Currently, only immediate incremental view maintenance > + * uses this. > + */ > +void > +SetTransitionTablePreserved(Oid relid, CmdType cmdType) > > Also, I removed releted unnecessary code which was left accidentally. > > >> - 0007-Add-aggregates-support-in-IVM.patch >> "Check if the given aggregate function is supporting" shouldn't be >> "Check if the given aggregate function is supporting IVM"? > > Yes, you are right. I fixed this, too. > >> >> + * check_aggregate_supports_ivm >> + * >> + * Check if the given aggregate function is supporting Thanks for the fixes. I have changed the commit fest status to "Ready for Committer". Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Hi, I updated the wiki page. https://wiki.postgresql.org/wiki/Incremental_View_Maintenance On Fri, 21 Aug 2020 21:40:50 +0900 (JST) Tatsuo Ishii <ishii@sraoss.co.jp> wrote: > From: Yugo NAGATA <nagata@sraoss.co.jp> > Subject: Re: Implementing Incremental View Maintenance > Date: Fri, 21 Aug 2020 17:23:20 +0900 > Message-ID: <20200821172320.a2506577d5244b6066f69331@sraoss.co.jp> > > > On Wed, 19 Aug 2020 10:02:42 +0900 (JST) > > Tatsuo Ishii <ishii@sraoss.co.jp> wrote: > > > >> I have looked into this. > > > > Thank you for your reviewing! > > > >> - 0004-Allow-to-prolong-life-span-of-transition-tables-unti.patch: > >> This one needs a comment to describe what the function does etc. > >> > >> +void > >> +SetTransitionTablePreserved(Oid relid, CmdType cmdType) > >> +{ > > > > I added a comment for this function and related places. > > > > +/* > > + * SetTransitionTablePreserved > > + * > > + * Prolong lifespan of transition tables corresponding specified relid and > > + * command type to the end of the outmost query instead of each nested query. > > + * This enables to use nested AFTER trigger's transition tables from outer > > + * query's triggers. Currently, only immediate incremental view maintenance > > + * uses this. > > + */ > > +void > > +SetTransitionTablePreserved(Oid relid, CmdType cmdType) > > > > Also, I removed releted unnecessary code which was left accidentally. > > > > > >> - 0007-Add-aggregates-support-in-IVM.patch > >> "Check if the given aggregate function is supporting" shouldn't be > >> "Check if the given aggregate function is supporting IVM"? > > > > Yes, you are right. I fixed this, too. > > > >> > >> + * check_aggregate_supports_ivm > >> + * > >> + * Check if the given aggregate function is supporting > > Thanks for the fixes. I have changed the commit fest status to "Ready > for Committer". > > Best regards, > -- > Tatsuo Ishii > SRA OSS, Inc. Japan > English: http://www.sraoss.co.jp/index_en.php > Japanese:http://www.sraoss.co.jp > > -- Yugo NAGATA <nagata@sraoss.co.jp>
Hi Nagata-san, On Mon, Aug 31, 2020 at 5:32 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote: > https://wiki.postgresql.org/wiki/Incremental_View_Maintenance Thanks for writing this! + /* + * Wait for concurrent transactions which update this materialized view at + * READ COMMITED. This is needed to see changes committed in other + * transactions. No wait and raise an error at REPEATABLE READ or + * SERIALIZABLE to prevent update anomalies of matviews. + * XXX: dead-lock is possible here. + */ + if (!IsolationUsesXactSnapshot()) + LockRelationOid(matviewOid, ExclusiveLock); + else if (!ConditionalLockRelationOid(matviewOid, ExclusiveLock)) Could you please say a bit more about your plans for concurrency control? Simple hand-crafted "rollup" triggers typically conflict only when modifying the same output rows due to update/insert conflicts, or perhaps some explicit row level locking if they're doing something complex (unfortunately, they also very often have concurrency bugs...). In some initial reading about MV maintenance I did today in the hope of understanding some more context for this very impressive but rather intimidating patch set, I gained the impression that aggregate-row locking granularity is assumed as a baseline for eager incremental aggregate maintenance. I understand that our MVCC/snapshot scheme introduces extra problems, but I'm wondering if these problems can be solved using the usual update semantics (the EvalPlanQual mechanism), and perhaps also some UPSERT logic. Why is it not sufficient to have locked all the base table rows that you have modified, captured the before-and-after values generated by those updates, and also locked all the IMV aggregate rows you will read, and in the process acquired a view of the latest committed state of the IMV aggregate rows you will modify (possibly having waited first)? In other words, what other data do you look at, while computing the incremental update, that might suffer from anomalies because of snapshots and concurrency? For one thing, I am aware that unique indexes for groups would probably be necessary; perhaps some subtle problems of the sort usually solved with predicate locks lurk there? (Newer papers describe locking schemes that avoid even aggregate-row level conflicts, by taking advantage of the associativity and commutativity of aggregates like SUM and COUNT. You can allow N writers to update the aggregate concurrently, and if any transaction has to roll back it subtracts what it added, not necessarily restoring the original value, so that nobody conflicts with anyone else, or something like that... Contemplating an MVCC, no-rollbacks version of that sort of thing leads to ideas like, I dunno, update chains containing differential update trees to be compacted later... egad!)
Hi Thomas, Thank you for your comment! On Sat, 5 Sep 2020 17:56:18 +1200 Thomas Munro <thomas.munro@gmail.com> wrote: > + /* > + * Wait for concurrent transactions which update this materialized view at > + * READ COMMITED. This is needed to see changes committed in other > + * transactions. No wait and raise an error at REPEATABLE READ or > + * SERIALIZABLE to prevent update anomalies of matviews. > + * XXX: dead-lock is possible here. > + */ > + if (!IsolationUsesXactSnapshot()) > + LockRelationOid(matviewOid, ExclusiveLock); > + else if (!ConditionalLockRelationOid(matviewOid, ExclusiveLock)) > > Could you please say a bit more about your plans for concurrency control? > > Simple hand-crafted "rollup" triggers typically conflict only when > modifying the same output rows due to update/insert conflicts, or > perhaps some explicit row level locking if they're doing something > complex (unfortunately, they also very often have concurrency > bugs...). In some initial reading about MV maintenance I did today in > the hope of understanding some more context for this very impressive > but rather intimidating patch set, I gained the impression that > aggregate-row locking granularity is assumed as a baseline for eager > incremental aggregate maintenance. I understand that our > MVCC/snapshot scheme introduces extra problems, but I'm wondering if > these problems can be solved using the usual update semantics (the > EvalPlanQual mechanism), and perhaps also some UPSERT logic. Why is > it not sufficient to have locked all the base table rows that you have > modified, captured the before-and-after values generated by those > updates, and also locked all the IMV aggregate rows you will read, and > in the process acquired a view of the latest committed state of the > IMV aggregate rows you will modify (possibly having waited first)? In > other words, what other data do you look at, while computing the > incremental update, that might suffer from anomalies because of > snapshots and concurrency? For one thing, I am aware that unique > indexes for groups would probably be necessary; perhaps some subtle > problems of the sort usually solved with predicate locks lurk there? I decided to lock a matview considering views joining tables. For example, let V = R*S is an incrementally maintainable materialized view which joins tables R and S. Suppose there are two concurrent transactions T1 which changes table R to R' and T2 which changes S to S'. Without any lock, in READ COMMITTED mode, V would be updated to represent V=R'*S in T1, and V=R*S' in T2, so it would cause inconsistency. By locking the view V, transactions T1, T2 are processed serially and this inconsistency can be avoided. I also thought it might be resolved using tuple locks and EvalPlanQual instead of table level lock, but there is still a unavoidable case. For example, suppose that tuple dR is inserted into R in T1, and dS is inserted into S in T2. Also, suppose that dR and dS will be joined in according to the view definition. In this situation, without any lock, the change of V is computed as dV=dR*S in T1, dV=R*dS in T2, respectively, and dR*dS would not be included in the results. This causes inconsistency. I don't think this could be resolved even if we use tuple locks. As to aggregate view without join , however, we might be able to use a lock of more low granularity as you said, because if rows belonging a group in a table is changes, we just update (or delete) corresponding rows in the view. Even if there are concurrent transactions updating the same table, we would be able to make one of them wait using tuple lock. If concurrent transactions are trying to insert a tuple into the same table, we might need to use unique index and UPSERT to avoid to insert multiple rows with same group key into the view. Therefore, usual update semantics (tuple locks and EvalPlanQual) and UPSERT can be used for optimization for some classes of view, but we don't have any other better idea than using table lock for views joining tables. We would appreciate it if you could suggest better solution. > (Newer papers describe locking schemes that avoid even aggregate-row > level conflicts, by taking advantage of the associativity and > commutativity of aggregates like SUM and COUNT. You can allow N > writers to update the aggregate concurrently, and if any transaction > has to roll back it subtracts what it added, not necessarily restoring > the original value, so that nobody conflicts with anyone else, or > something like that... Contemplating an MVCC, no-rollbacks version of > that sort of thing leads to ideas like, I dunno, update chains > containing differential update trees to be compacted later... egad!) I am interested in papers you mentioned! Are they literatures in context of incremental view maintenance? Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
On Wed, Sep 9, 2020 at 12:29 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote: > I also thought it might be resolved using tuple locks and EvalPlanQual > instead of table level lock, but there is still a unavoidable case. For > example, suppose that tuple dR is inserted into R in T1, and dS is inserted > into S in T2. Also, suppose that dR and dS will be joined in according to > the view definition. In this situation, without any lock, the change of V is > computed as dV=dR*S in T1, dV=R*dS in T2, respectively, and dR*dS would not > be included in the results. This causes inconsistency. I don't think this > could be resolved even if we use tuple locks. I see. Thanks for the explanation! > As to aggregate view without join , however, we might be able to use a lock > of more low granularity as you said, because if rows belonging a group in a > table is changes, we just update (or delete) corresponding rows in the view. > Even if there are concurrent transactions updating the same table, we would > be able to make one of them wait using tuple lock. If concurrent transactions > are trying to insert a tuple into the same table, we might need to use unique > index and UPSERT to avoid to insert multiple rows with same group key into > the view. > > Therefore, usual update semantics (tuple locks and EvalPlanQual) and UPSERT > can be used for optimization for some classes of view, but we don't have any > other better idea than using table lock for views joining tables. We would > appreciate it if you could suggest better solution. I have nothing, I'm just reading starter papers and trying to learn a bit more about the concepts at this stage. I was thinking of reviewing some of the more mechanical parts of the patch set, though, like perhaps the transition table lifetime management, since I have worked on that area before. > > (Newer papers describe locking schemes that avoid even aggregate-row > > level conflicts, by taking advantage of the associativity and > > commutativity of aggregates like SUM and COUNT. You can allow N > > writers to update the aggregate concurrently, and if any transaction > > has to roll back it subtracts what it added, not necessarily restoring > > the original value, so that nobody conflicts with anyone else, or > > something like that... Contemplating an MVCC, no-rollbacks version of > > that sort of thing leads to ideas like, I dunno, update chains > > containing differential update trees to be compacted later... egad!) > > I am interested in papers you mentioned! Are they literatures in context of > incremental view maintenance? Yeah. I was skim-reading some parts of [1] including section 2.5.1 "Concurrency Control", which opens with some comments about aggregates, locking and pointers to "V-locking" [2] for high concurrency aggregates. There is also a pointer to G. Graefe and M. J. Zwilling, "Transaction support for indexed views," which I haven't located; apparently indexed views are Graefe's name for MVs, and apparently this paper has a section on MVCC systems which sounds interesting for us. [1] https://dsf.berkeley.edu/cs286/papers/mv-fntdb2012.pdf [2] http://pages.cs.wisc.edu/~gangluo/latch.pdf
On Wed, 9 Sep 2020 14:22:28 +1200 Thomas Munro <thomas.munro@gmail.com> wrote: > > Therefore, usual update semantics (tuple locks and EvalPlanQual) and UPSERT > > can be used for optimization for some classes of view, but we don't have any > > other better idea than using table lock for views joining tables. We would > > appreciate it if you could suggest better solution. > > I have nothing, I'm just reading starter papers and trying to learn a > bit more about the concepts at this stage. I was thinking of > reviewing some of the more mechanical parts of the patch set, though, > like perhaps the transition table lifetime management, since I have > worked on that area before. Thank you for your interrest. It would be greatly appreciated if you could review the patch. > > > (Newer papers describe locking schemes that avoid even aggregate-row > > > level conflicts, by taking advantage of the associativity and > > > commutativity of aggregates like SUM and COUNT. You can allow N > > > writers to update the aggregate concurrently, and if any transaction > > > has to roll back it subtracts what it added, not necessarily restoring > > > the original value, so that nobody conflicts with anyone else, or > > > something like that... Contemplating an MVCC, no-rollbacks version of > > > that sort of thing leads to ideas like, I dunno, update chains > > > containing differential update trees to be compacted later... egad!) > > > > I am interested in papers you mentioned! Are they literatures in context of > > incremental view maintenance? > > Yeah. I was skim-reading some parts of [1] including section 2.5.1 > "Concurrency Control", which opens with some comments about > aggregates, locking and pointers to "V-locking" [2] for high > concurrency aggregates. There is also a pointer to G. Graefe and M. > J. Zwilling, "Transaction support for indexed views," which I haven't > located; apparently indexed views are Graefe's name for MVs, and > apparently this paper has a section on MVCC systems which sounds > interesting for us. > > [1] https://dsf.berkeley.edu/cs286/papers/mv-fntdb2012.pdf > [2] http://pages.cs.wisc.edu/~gangluo/latch.pdf Thanks for your information! I will also check references regarding with IVM and concurrency control. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
> I have nothing, I'm just reading starter papers and trying to learn a > bit more about the concepts at this stage. I was thinking of > reviewing some of the more mechanical parts of the patch set, though, > like perhaps the transition table lifetime management, since I have > worked on that area before. Do you have comments on this part? I am asking because these patch sets are now getting closer to committable state in my opinion, and if there's someting wrong, it should be fixed soon so that these patches are getting into the master branch. I think this feature has been long awaited by users and merging the patches should be a benefit for them. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
> On Thu, Sep 17, 2020 at 09:42:45AM +0900, Tatsuo Ishii wrote: >> I am asking because these patch sets are now getting closer to >> committable state in my opinion, and if there's someting wrong, it >> should be fixed soon so that these patches are getting into the master >> branch. >> >> I think this feature has been long awaited by users and merging the >> patches should be a benefit for them. > > I don't have much thoughts to offer about that, but this patch is > failing to apply, so a rebase is at least necessary. Yes. I think he is going to post a new patch (possibly with enhancements) soon. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On 2020/10/01 13:03, Tatsuo Ishii wrote: >> On Thu, Sep 17, 2020 at 09:42:45AM +0900, Tatsuo Ishii wrote: >>> I am asking because these patch sets are now getting closer to >>> committable state in my opinion, and if there's someting wrong, it >>> should be fixed soon so that these patches are getting into the master >>> branch. >>> >>> I think this feature has been long awaited by users and merging the >>> patches should be a benefit for them. >> >> I don't have much thoughts to offer about that, but this patch is >> failing to apply, so a rebase is at least necessary. > > Yes. I think he is going to post a new patch (possibly with > enhancements) soon. When I glanced the doc patch (i.e., 0012), I found some typos. + <command>CRATE INCREMENTAL MATERIALIZED VIEW</command>, for example: Typo: CRATE should be CREATE ? + with <literal>__ivm_</literal> and they contains information required Typo: contains should be contain ? + For exmaple, here are two materialized views based on the same view Typo: exmaple should be example ? + maintenance can be lager than <command>REFRESH MATERIALIZED VIEW</command> Typo: lager should be larger ? +postgres=# SELECt * FROM m; -- automatically updated Typo: SELECt should be SELECT ? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Thu, 1 Oct 2020 13:43:49 +0900 Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > When I glanced the doc patch (i.e., 0012), I found some typos. Thank you for your pointing out typos! I'll fix it. > > + <command>CRATE INCREMENTAL MATERIALIZED VIEW</command>, for example: > > Typo: CRATE should be CREATE ? > > + with <literal>__ivm_</literal> and they contains information required > > Typo: contains should be contain ? > > + For exmaple, here are two materialized views based on the same view > > Typo: exmaple should be example ? > > + maintenance can be lager than <command>REFRESH MATERIALIZED VIEW</command> > > Typo: lager should be larger ? > > +postgres=# SELECt * FROM m; -- automatically updated > > Typo: SELECt should be SELECT ? > > Regards, > > -- > Fujii Masao > Advanced Computing Technology Center > Research and Development Headquarters > NTT DATA CORPORATION -- Yugo NAGATA <nagata@sraoss.co.jp>
Hi, Attached is the rebased patch (v18) to add support for Incremental Materialized View Maintenance (IVM). It is able to be applied to current latest master branch. Also, this now supports simple CTEs (WITH clauses) which do not contain aggregates or DISTINCT like simple sub-queries. This feature is provided as an additional patch segment "0010-Add-CTE-support-in-IVM.patch". ==== Example ==== cte=# TABLE r; i | v ---+---- 1 | 10 2 | 20 (2 rows) cte=# TABLE s; i | v ---+----- 2 | 200 3 | 300 (2 rows) cte=# \d+ mv Materialized view "public.mv" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+---------+-----------+----------+---------+---------+--------------+------------- r | integer | | | | plain | | x | integer | | | | plain | | View definition: WITH x AS ( SELECT s.i, s.v FROM s ) SELECT r.v AS r, x.v AS x FROM r, x WHERE r.i = x.i; Access method: heap Incremental view maintenance: yes cte=# SELECT * FROM mv; r | x ----+----- 20 | 200 (1 row) cte=# INSERT INTO r VALUES (3,30); INSERT 0 1 cte=# INSERT INTO s VALUES (1,100); INSERT 0 1 cte=# SELECT * FROM mv; r | x ----+----- 20 | 200 30 | 300 10 | 100 (3 rows) ====================== Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
Hi, I have reviewed the past discussions in this thread on IVM implementation of the proposed patch[1], and summarized it as following . We would appreciate any comments or suggestions on the patch as regard of them. * Aggregate support The current patch supports several built-in aggregates, that is, count, sum, avg, min, and max. Other built-in aggregates or user-defined aggregates are not supported. Aggregates in a materialized view definition is checked if this is supported using OIDs of aggregate function. For this end, Gen_fmgrtab.pl is changed to output aggregate function's OIDs to fmgroids.h (by 0006-Change-Gen_fmgrtab.pl-to-output-aggregate-function-s.patch). The logic for each aggregate function to update aggregate values in materialized views is enbedded in a trigger function. There was another option in the past discussion. That is, we could add one or more new attribute to pg_aggregate which provides information about if each aggregate function supports IVM and its logic[2]. If we have a mechanism to support IVM in pg_aggregate, we may use more general aggregate functions including user-defined aggregate in materialized views for IVM. For example, the current pg_aggregate has aggcombinefn attribute for supporting partial aggregation. Maybe we could use combine functions to calculate new aggregate values in materialized views when tuples are inserted into a base table. However, in the context of IVM, we also need other function used when tuples are deleted from a base table, so we can not use partial aggregation for IVM in the current implementation. Maybe, we could support the deletion case by adding a new support function, say, "inverse combine function". The "inverse combine function" would take aggregate value in a materialized view and aggregate value calculated from a delta of view, and produces the new aggregate value which equals the result after tuples in a base table are deleted. However, we don't have concrete plan for the new design of pg_aggregate. In addition, even if make a new support function in pg_aggregate for IVM, we can't use this in the current IVM code because our code uses SQL via SPI in order to update a materialized view and we can't call "internal" type function directly in SQL. For these reasons, in the current patch, we decided to left supporting general aggregates to the next version for simplicity, so the current patch supports only some built-in aggregates and checks if they can be used in IVM by their OIDs. * Hidden columns For supporting aggregates, DISTINCT, and EXISTS, the current implementation automatically create hidden columns whose name starts with "__ivm_" in materialized views. The columns starting with "__ivm_" are hidden, so when "SELECT * FROM ..." is issued to a materialized view, these are invisible for users. Users can not use such name as a user column in materialized views with IVM support. As for how to make internal columns invisible to SELECT *, previously there have been discussions about doing that using a new flag in pg_attribute[3]. However, the discussion is no longer active. So, we decided to use column name for checking if this is special or not in our IVM implementation for now. * TRUNCATE support Currently, TRUNCATE on base tables are not supported. When TRUNCATE command is executed on a base table, it is ignored and nothing occurs on materialized views. There are another options as followings: - Raise an error or warning when a base table is TRUNCATEed. - Make the view non-scannable (like REFRESH WITH NO DATA) - Update the view in any way. It would be easy for inner joins or aggregate views, but there is some difficult with outer joins. Which is the best way? Should we support TRUNCATE in the first version? Any suggestions would be greatly appreciated. [1] https://wiki.postgresql.org/wiki/Incremental_View_Maintenance [2] https://www.postgresql.org/message-id/20191129173328.e5a0e9f81e369a3769c4fd0c%40sraoss.co.jp [3] https://www.postgresql.org/message-id/flat/CAEepm%3D3ZHh%3Dp0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA%40mail.gmail.com Regard, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
> * Aggregate support > > The current patch supports several built-in aggregates, that is, count, sum, > avg, min, and max. Other built-in aggregates or user-defined aggregates are > not supported. > > Aggregates in a materialized view definition is checked if this is supported > using OIDs of aggregate function. For this end, Gen_fmgrtab.pl is changed to > output aggregate function's OIDs to fmgroids.h > (by 0006-Change-Gen_fmgrtab.pl-to-output-aggregate-function-s.patch). > The logic for each aggregate function to update aggregate values in > materialized views is enbedded in a trigger function. > > There was another option in the past discussion. That is, we could add one > or more new attribute to pg_aggregate which provides information about if > each aggregate function supports IVM and its logic[2]. If we have a mechanism > to support IVM in pg_aggregate, we may use more general aggregate functions > including user-defined aggregate in materialized views for IVM. > > For example, the current pg_aggregate has aggcombinefn attribute for > supporting partial aggregation. Maybe we could use combine functions to > calculate new aggregate values in materialized views when tuples are > inserted into a base table. However, in the context of IVM, we also need > other function used when tuples are deleted from a base table, so we can not > use partial aggregation for IVM in the current implementation. > > Maybe, we could support the deletion case by adding a new support function, > say, "inverse combine function". The "inverse combine function" would take > aggregate value in a materialized view and aggregate value calculated from a > delta of view, and produces the new aggregate value which equals the result > after tuples in a base table are deleted. > > However, we don't have concrete plan for the new design of pg_aggregate. > In addition, even if make a new support function in pg_aggregate for IVM, > we can't use this in the current IVM code because our code uses SQL via SPI > in order to update a materialized view and we can't call "internal" type > function directly in SQL. > > For these reasons, in the current patch, we decided to left supporting > general aggregates to the next version for simplicity, so the current patch > supports only some built-in aggregates and checks if they can be used in IVM > by their OIDs. Current patch for IVM is already large. I think implementing above will make the patch size even larger, which makes reviewer's work difficult. So I personally think we should commit the patch as it is, then enhance IVM to support user defined and other aggregates in later version of PostgreSQL. However, if supporting user defined and other aggregates is quite important for certain users, then we should rethink about this. It will be nice if we could know how high such demand is. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Hi Adam Brusselback, On Mon, 31 Dec 2018 11:20:11 -0500 Adam Brusselback <adambrusselback@gmail.com> wrote: > Hi all, just wanted to say I am very happy to see progress made on this, > my codebase has multiple "materialized tables" which are maintained with > statement triggers (transition tables) and custom functions. They are ugly > and a pain to maintain, but they work because I have no other > solution...for now at least. We are want to find sutable use cases of the IVM patch being discussed in this thread, and I remembered your post that said you used statement triggers and custom functions. We hope the patch will help you. The patch implements IVM of immediate, that is, eager approach. Materialized views are updated immediately when its base tables are modified. While the view is always up-to-date, there is a overhead on base table modification. We would appreciate it if you could tell us what your use cases of materialized view is and whether our implementation suits your needs or not. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Hi Adam, On Thu, 22 Oct 2020 10:07:29 -0400 Adam Brusselback <adambrusselback@gmail.com> wrote: > Hey there Yugo, > I've asked a coworker to prepare a self contained example that encapsulates > our multiple use cases. Thank you very much! > The immediate/eager approach is exactly what we need, as within the same > transaction we have statements that can cause one of those "materialized > tables" to be updated, and then sometimes have the need to query that > "materialized table" in a subsequent statement and need to see the changes > reflected. The proposed patch provides the exact this feature and I think this will meet your needs. > As soon as my coworker gets that example built up I'll send a followup with > it attached. Great! We are looking forward to it. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Hi Adam,
On Thu, 22 Oct 2020 10:07:29 -0400
Adam Brusselback <adambrusselback@gmail.com> wrote:
> Hey there Yugo,
> I've asked a coworker to prepare a self contained example that encapsulates
> our multiple use cases.
Thank you very much!
> The immediate/eager approach is exactly what we need, as within the same
> transaction we have statements that can cause one of those "materialized
> tables" to be updated, and then sometimes have the need to query that
> "materialized table" in a subsequent statement and need to see the changes
> reflected.
The proposed patch provides the exact this feature and I think this will meet
your needs.
> As soon as my coworker gets that example built up I'll send a followup with
> it attached.
Great! We are looking forward to it.
Regards,
Yugo Nagata
--
Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
Hi Anastasia Lubennikova, I am writing this to you because I would like to ask the commitfest manager something. The status of the patch was changed to "Waiting on Author" from "Ready for Committer" at the beginning of this montfor the reason that rebase was necessary. Now I updated the patch, so can I change the status back to "Ready for Committer"? Regards, Yugo Nagata On Mon, 5 Oct 2020 18:16:18 +0900 Yugo NAGATA <nagata@sraoss.co.jp> wrote: > Hi, > > Attached is the rebased patch (v18) to add support for Incremental > Materialized View Maintenance (IVM). It is able to be applied to > current latest master branch. -- Yugo NAGATA <nagata@sraoss.co.jp>
On Tue, 27 Oct 2020 12:14:52 -0400 Adam Brusselback <adambrusselback@gmail.com> wrote: > That was a good bit more work to get ready than I expected. It's broken > into two scripts, one to create the schema, the other to load data and > containing a couple check queries to ensure things are working properly > (checking the materialized tables against a regular view for accuracy). Thank you very much! I am really grateful. > The first test case is to give us a definitive result on what "agreed > pricing" is in effect at a point in time based on a product hierarchy > our customers setup, and allow pricing to be set on nodes in that > hierarchy, as well as specific products (with an order of precedence). > The second test case maintains some aggregated amounts / counts / boolean > logic at an "invoice" level for all the detail lines which make up that > invoice. > > Both of these are real-world use cases which were simplified a bit to make > them easier to understand. We have other use cases as well, but with how > much time this took to prepare i'll keep it at this for now. > If you need anything clarified or have any issues, just let me know. Although I have not look into it in details yet, in my understanding, it seems that materialized views are used to show "pricing" or "invoice" information before the order is confirmed, that is, before the transaction is committed. Definitely, these will be use cases where immediate view maintenance is useful. I am happy because I found concrete use cases of immediate IVM. However, unfortunately, the view definitions in your cases are complex, and the current implementation of the patch doesn't support it. We would like to improve the feature in future so that more complex views could benefit from IVM. Regards, Yugo Nagata > On Fri, Oct 23, 2020 at 3:58 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > Hi Adam, > > > > On Thu, 22 Oct 2020 10:07:29 -0400 > > Adam Brusselback <adambrusselback@gmail.com> wrote: > > > > > Hey there Yugo, > > > I've asked a coworker to prepare a self contained example that > > encapsulates > > > our multiple use cases. > > > > Thank you very much! > > > > > The immediate/eager approach is exactly what we need, as within the same > > > transaction we have statements that can cause one of those "materialized > > > tables" to be updated, and then sometimes have the need to query that > > > "materialized table" in a subsequent statement and need to see the > > changes > > > reflected. > > > > The proposed patch provides the exact this feature and I think this will > > meet > > your needs. > > > > > As soon as my coworker gets that example built up I'll send a followup > > with > > > it attached. > > > > Great! We are looking forward to it. > > > > Regards, > > Yugo Nagata > > > > -- > > Yugo NAGATA <nagata@sraoss.co.jp> > > -- Yugo NAGATA <nagata@sraoss.co.jp>
Hi Anastasia Lubennikova,
I am writing this to you because I would like to ask the commitfest
manager something.
The status of the patch was changed to "Waiting on Author" from
"Ready for Committer" at the beginning of this montfor the reason
that rebase was necessary. Now I updated the patch, so can I change
the status back to "Ready for Committer"?
Regards,
Yugo Nagata
Yes, go ahead. As far as I see, the patch is in a good shape and there are no unanswered questions from reviewers.
Feel free to change the status of CF entries, when it seems reasonable to you.
See https://en.wikipedia.org/wiki/Posting_style#Top-posting for details.
On Wed, 28 Oct 2020 12:01:58 +0300 Anastasia Lubennikova <lubennikovaav@gmail.com> wrote: > ср, 28 окт. 2020 г. в 08:02, Yugo NAGATA <nagata@sraoss.co.jp>: > > > Hi Anastasia Lubennikova, > > > > I am writing this to you because I would like to ask the commitfest > > manager something. > > > > The status of the patch was changed to "Waiting on Author" from > > "Ready for Committer" at the beginning of this montfor the reason > > that rebase was necessary. Now I updated the patch, so can I change > > the status back to "Ready for Committer"? > > > > Regards, > > Yugo Nagata > > > > > Yes, go ahead. As far as I see, the patch is in a good shape and there are > no unanswered questions from reviewers. > Feel free to change the status of CF entries, when it seems reasonable to > you. Thank you for your response! I get it. > P.S. Please, avoid top-posting, It makes it harder to follow the > discussion, in-line replies are customary in pgsql mailing lists. > See https://en.wikipedia.org/wiki/Posting_style#Top-posting for details. I understand it. Regards, Yugo Nagata > -- > Best regards, > Lubennikova Anastasia -- Yugo NAGATA <nagata@sraoss.co.jp>
On Mon, Oct 05, 2020 at 06:16:18PM +0900, Yugo NAGATA wrote: > Hi, > > Attached is the rebased patch (v18) to add support for Incremental This needs to be rebased again - the last version doesn't apply anymore. http://cfbot.cputube.org/yugo-nagata.html I looked though it a bit and attach some fixes to the user-facing docs. There's some more typos in the source that I didn't fix: constrains materliazied cluase immediaite clumn Temrs migth recalculaetd speified secuirty commit message: comletion psql and pg_dump say 13 but should say 14 now: pset.sversion >= 130000 # bag union big union? + <structfield>relisivm</structfield> <type>bool</type> + </para> + <para> + True if materialized view enables incremental view maintenance This isn't clear, but I think it should say "True for materialized views which are enabled for incremental view maintenance (IVM)." -- Justin
Attachment
On 05.10.2020 12:16, Yugo NAGATA wrote: > Hi, > > Attached is the rebased patch (v18) to add support for Incremental > Materialized View Maintenance (IVM). It is able to be applied to > current latest master branch. > Thank you very much for this work. I consider incremental materialized views as "reincarnation" of OLAP hypercubes. There are two approaches of making OLAP queries faster: 1. speed up query execution (using JIT, columnar store, vector operations and parallel execution) 2. precalculate requested data Incremental materialize views make it possible to implement second approach. But how competitive it is? I do not know current limitations of incremental materialized views, but I checked that basic OLAP functionality: JOIN+GROUP_BY+AGGREGATION is supported. The patch is not applied to the current master because makeFuncCall prototype is changed, I fixed it by adding COAERCE_CALL_EXPLICIT. Then I did the following simple test: 1. Create pgbench database with scale 100. pgbench speed at my desktop is about 10k TPS: pgbench -M prepared -N -c 10 -j 4 -T 30 -P 1 postgres tps = 10194.951827 (including connections establishing) 2. Then I created incremental materialized view: create incremental materialized view teller_sums as select t.tid,sum(abalance) from pgbench_accounts a join pgbench_tellers t on a.bid=t.bid group by t.tid; SELECT 1000 Time: 20805.230 ms (00:20.805) 20 second is reasonable time, comparable with time of database initialization. Then obviously we see advantages of precalculated aggregates: postgres=# select * from teller_sums where tid=1; tid | sum -----+-------- 1 | -96427 (1 row) Time: 0.871 ms postgres=# select t.tid,sum(abalance) from pgbench_accounts a join pgbench_tellers t on a.bid=t.bid group by t.tid having t.tid=1 ; tid | sum -----+-------- 1 | -96427 (1 row) Time: 915.508 ms Amazing. Almost 1000 times difference! 3. Run pgbench once again: Ooops! Now TPS are much lower: tps = 141.767347 (including connections establishing) Speed of updates is reduced more than 70 times! Looks like we loose parallelism because almost the same result I get with just one connection. 4. Finally let's create one more view (it is reasonable to expect that analytics will run many different queries and so need multiple views). create incremental materialized view teller_avgs as select t.tid,avg(abalance) from pgbench_accounts a join pgbench_tellers t on a.bid=t.bid group by t.tid; It is great that not only simple aggregates like SUM are supported, but also AVG. But insertion speed now is reduced twice - 72TPS. I tried to make some profiling but didn't see something unusual: 16.41% postgres postgres [.] ExecInterpExpr 8.78% postgres postgres [.] slot_deform_heap_tuple 3.23% postgres postgres [.] ExecMaterial 2.71% postgres postgres [.] AllocSetCheck 2.33% postgres postgres [.] AllocSetAlloc 2.29% postgres postgres [.] slot_getsomeattrs_int 2.26% postgres postgres [.] ExecNestLoop 2.11% postgres postgres [.] MemoryContextReset 1.98% postgres postgres [.] tts_minimal_store_tuple 1.87% postgres postgres [.] heap_compute_data_size 1.78% postgres postgres [.] fill_val 1.56% postgres postgres [.] tuplestore_gettuple 1.44% postgres postgres [.] sentinel_ok 1.35% postgres postgres [.] heap_fill_tuple 1.27% postgres postgres [.] tuplestore_gettupleslot 1.17% postgres postgres [.] ExecQual 1.14% postgres postgres [.] tts_minimal_clear 1.13% postgres postgres [.] CheckOpSlotCompatibility 1.10% postgres postgres [.] base_yyparse 1.10% postgres postgres [.] heapgetpage 1.04% postgres postgres [.] heap_form_minimal_tuple 1.00% postgres postgres [.] slot_getsomeattrs So good news is that incremental materialized views really work. And bad news is that maintenance overhead is too large which significantly restrict applicability of this approach. Certainly in case of dominated read-only workload such materialized views can significantly improve performance. But unfortunately my dream that them allow to combine OLAP+OLPT is not currently realized. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Hello Konstantin, I remember testing it with pg_stat_statements (and planning counters enabled). Maybe identifying internal queries associated with this (simple) test case, could help dev team ? Regards PAscal -- Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
On Thu, 5 Nov 2020 22:58:25 -0600 Justin Pryzby <pryzby@telsasoft.com> wrote: > On Mon, Oct 05, 2020 at 06:16:18PM +0900, Yugo NAGATA wrote: > This needs to be rebased again - the last version doesn't apply anymore. > http://cfbot.cputube.org/yugo-nagata.html I attached the rebased patch (v19). > I looked though it a bit and attach some fixes to the user-facing docs. Thank you for pointing out a lot of typos and making the patch to fix it! Your fixes are included in the latest patch. > There's some more typos in the source that I didn't fix: > constrains > materliazied > cluase > immediaite > clumn > Temrs > migth > recalculaetd > speified > secuirty > > commit message: comletion > > psql and pg_dump say 13 but should say 14 now: > pset.sversion >= 130000 These were also fixed. > # bag union > big union? "bag union" is union operation of bag (multi-set) that does not eliminate duplicate of tuples. > + <structfield>relisivm</structfield> <type>bool</type> > + </para> > + <para> > + True if materialized view enables incremental view maintenance > > This isn't clear, but I think it should say "True for materialized views which > are enabled for incremental view maintenance (IVM)." Yes, you are right. I also fixed it in this way. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
> 1. Create pgbench database with scale 100. > pgbench speed at my desktop is about 10k TPS: > > pgbench -M prepared -N -c 10 -j 4 -T 30 -P 1 postgres > tps = 10194.951827 (including connections establishing) > > 2. Then I created incremental materialized view: > > create incremental materialized view teller_sums as select > t.tid,sum(abalance) from pgbench_accounts a join pgbench_tellers t on > a.bid=t.bid group by t.tid; > SELECT 1000 > Time: 20805.230 ms (00:20.805) > > 20 second is reasonable time, comparable with time of database > initialization. > > Then obviously we see advantages of precalculated aggregates: > > postgres=# select * from teller_sums where tid=1; > tid | sum > -----+-------- > 1 | -96427 > (1 row) > > Time: 0.871 ms > postgres=# select t.tid,sum(abalance) from pgbench_accounts a join > pgbench_tellers t on a.bid=t.bid group by t.tid having t.tid=1 > ; > tid | sum > -----+-------- > 1 | -96427 > (1 row) > > Time: 915.508 ms > > Amazing. Almost 1000 times difference! > > 3. Run pgbench once again: > > Ooops! Now TPS are much lower: > > tps = 141.767347 (including connections establishing) > > Speed of updates is reduced more than 70 times! > Looks like we loose parallelism because almost the same result I get > with just one connection. How much TPS do you get if you execute pgbench -c 1 without incremental materialized view defined? If it's around 141 then we could surely confirm that the major bottle neck is locking contention. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On 12.11.2020 14:53, Tatsuo Ishii wrote: >> 1. Create pgbench database with scale 100. >> pgbench speed at my desktop is about 10k TPS: >> >> pgbench -M prepared -N -c 10 -j 4 -T 30 -P 1 postgres >> tps = 10194.951827 (including connections establishing) >> >> 2. Then I created incremental materialized view: >> >> create incremental materialized view teller_sums as select >> t.tid,sum(abalance) from pgbench_accounts a join pgbench_tellers t on >> a.bid=t.bid group by t.tid; >> SELECT 1000 >> Time: 20805.230 ms (00:20.805) >> >> 20 second is reasonable time, comparable with time of database >> initialization. >> >> Then obviously we see advantages of precalculated aggregates: >> >> postgres=# select * from teller_sums where tid=1; >> tid | sum >> -----+-------- >> 1 | -96427 >> (1 row) >> >> Time: 0.871 ms >> postgres=# select t.tid,sum(abalance) from pgbench_accounts a join >> pgbench_tellers t on a.bid=t.bid group by t.tid having t.tid=1 >> ; >> tid | sum >> -----+-------- >> 1 | -96427 >> (1 row) >> >> Time: 915.508 ms >> >> Amazing. Almost 1000 times difference! >> >> 3. Run pgbench once again: >> >> Ooops! Now TPS are much lower: >> >> tps = 141.767347 (including connections establishing) >> >> Speed of updates is reduced more than 70 times! >> Looks like we loose parallelism because almost the same result I get >> with just one connection. > How much TPS do you get if you execute pgbench -c 1 without > incremental materialized view defined? If it's around 141 then we > could surely confirm that the major bottle neck is locking contention. > My desktop has just 4 physical cores, so performance with one connection is about 2k TPS: pgbench -M prepared -N -c 1 -T 60 -P 1 postgres tps = 1949.233532 (including connections establishing) So there is still large gap (~14 times) between insert speed with/without incremental view. I did more investigations and found out that one pf the reasons of bad performance in this case is lack of index on materialized view, so update has to perform sequential scan through 1000 elements. Well, creation of proper indexes for table is certainly responsibility of DBA. But users may not consider materialized view as normal table. So the idea that index should be explicitly created for materialized view seems to be not so obvious. From the other side, implementation of materialized view knows which index is needed for performing efficient incremental update. I wonder if it can create such index itself implicitly or at least produce notice with proposal to create such index. In any case, after creation of index on tid column of materialized view, pgbench speed is increased from 141 to 331 TPS (more than two times). It is with single connection. But if I run pgbench with 10 connections, then performance is even slightly slower: 289 TPS. I looked throw your patch for exclusive table locks and found this fragment in matview.c: /* * Wait for concurrent transactions which update this materialized view at * READ COMMITED. This is needed to see changes committed in other * transactions. No wait and raise an error at REPEATABLE READ or * SERIALIZABLE to prevent update anomalies of matviews. * XXX: dead-lock is possible here. */ if (!IsolationUsesXactSnapshot()) LockRelationOid(matviewOid, ExclusiveLock); else if (!ConditionalLockRelationOid(matviewOid, ExclusiveLock)) I replaced it with RowExlusiveLock and ... got 1437 TPS with 10 connections. It is still about 7 times slower than performance without incremental view. But now the gap is not so dramatic. And it seems to be clear that this exclusive lock on matview is real show stopper for concurrent updates. I do not know which race conditions and anomalies we can get if replace table-level lock with row-level lock here. But I think that this problem should be addressed in any case: single client update mode is very rare scenario. I attached to this mail profile of pgbench workload with defined incremental view (with index). May be you will find it useful. One more disappointing observation of materialized views (now non-incremental). Time of creation of non-incremental materialized view is about 18 seconds: postgres=# create materialized view teller_avgs as select t.tid,avg(abalance) from pgbench_accounts a join pgbench_tellers t on a.bid=t.bid group by t.tid; SELECT 1000 Time: 17795.395 ms (00:17.795) But refresh of such view takes 55 seconds: postgres=# refresh materialized view teller_avgs; REFRESH MATERIALIZED VIEW Time: 55500.381 ms (00:55.500) And refresh time doesn't depend on amount of updates since last refresh: I got almost the same time when I ran pgbench for one minute before refresh and when two refreshes are performed subsequently. Adding index doesn't help much in this case and concurrent refresh is even slower: postgres=# refresh materialized view concurrently teller_avgs; REFRESH MATERIALIZED VIEW Time: 56981.772 ms (00:56.982) So it seems to be more efficient to drop and recreate materialized view rather than refresh it. At least in this case. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Wed, 11 Nov 2020 19:10:35 +0300 Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: Thank you for reviewing this patch! > > The patch is not applied to the current master because makeFuncCall > prototype is changed, > I fixed it by adding COAERCE_CALL_EXPLICIT. The rebased patch was submitted. > Ooops! Now TPS are much lower: > > tps = 141.767347 (including connections establishing) > > Speed of updates is reduced more than 70 times! > Looks like we loose parallelism because almost the same result I get > with just one connection. As you and Ishii-san mentioned in other posts, I think the reason would be a table lock on the materialized view that is acquired during view maintenance. I will explain more a bit in another post. > 4. Finally let's create one more view (it is reasonable to expect that > analytics will run many different queries and so need multiple views). > > create incremental materialized view teller_avgs as select > t.tid,avg(abalance) from pgbench_accounts a join pgbench_tellers t on > a.bid=t.bid group by t.tid; > > It is great that not only simple aggregates like SUM are supported, but > also AVG. > But insertion speed now is reduced twice - 72TPS. Yes, the current implementation takes twice time for updating a table time when a new incrementally maintainable materialized view is defined on the table because view maintenance is performed for each view. > > So good news is that incremental materialized views really work. > And bad news is that maintenance overhead is too large which > significantly restrict applicability of this approach. > Certainly in case of dominated read-only workload such materialized > views can significantly improve performance. > But unfortunately my dream that them allow to combine OLAP+OLPT is not > currently realized. As you concluded, there is a large overhead on updating base tables in the current implementation because it is immediate maintenance in which the view is updated in the same sentence where its base table is modified. Therefore, this is not suitable to OLTP workload where there are frequent updates of tables. For suppressing maintenance overhead in such workload, we have to implement "deferred maintenance" which collects table change logs and updates the view in another transaction afterward. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
On Thu, 12 Nov 2020 15:37:42 +0300 Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > Well, creation of proper indexes for table is certainly responsibility > of DBA. > But users may not consider materialized view as normal table. So the > idea that index should > be explicitly created for materialized view seems to be not so obvious. > From the other side, implementation of materialized view knows which > index is needed for performing efficient incremental update. > I wonder if it can create such index itself implicitly or at least > produce notice with proposal to create such index. That makes sense. Especially for aggregate views, it is obvious that creating an index on expressions used in GROUP BY is effective. For other views, creating an index on columns that come from primary keys of base tables would be effective if any. However, if any base table doesn't have a primary or unique key or such key column is not contained in the view's target list, it is hard to decide an appropriate index on the view. We can create an index on all columns in the target list, but it could cause overhead on view maintenance. So, just producing notice would be better for such cases. > I looked throw your patch for exclusive table locks and found this > fragment in matview.c: > > /* > * Wait for concurrent transactions which update this materialized > view at > * READ COMMITED. This is needed to see changes committed in other > * transactions. No wait and raise an error at REPEATABLE READ or > * SERIALIZABLE to prevent update anomalies of matviews. > * XXX: dead-lock is possible here. > */ > if (!IsolationUsesXactSnapshot()) > LockRelationOid(matviewOid, ExclusiveLock); > else if (!ConditionalLockRelationOid(matviewOid, ExclusiveLock)) > > > I replaced it with RowExlusiveLock and ... got 1437 TPS with 10 connections. > It is still about 7 times slower than performance without incremental view. > But now the gap is not so dramatic. And it seems to be clear that this > exclusive lock on matview is real show stopper for concurrent updates. > I do not know which race conditions and anomalies we can get if replace > table-level lock with row-level lock here. I explained it here: https://www.postgresql.org/message-id/20200909092752.c91758a1bec3479668e82643%40sraoss.co.jp For example, suppose there is a view V = R*S that joins tables R and S, and there are two concurrent transactions T1 which changes table R to R' and T2 which changes S to S'. Without any lock, in READ COMMITTED mode, V would be updated to R'*S in T1, and R*S' in T2, so it would cause inconsistency. By locking the view V, transactions T1, T2 are processed serially and this inconsistency can be avoided. Especially, suppose that tuple dR is inserted into R in T1, and dS is inserted into S in T2, where dR and dS will be joined in according to the view definition. In this situation, without any lock, the change of V is computed as dV=dR*S in T1, dV=R*dS in T2, respectively, and dR*dS would not be included in the results. This inconsistency could not be resolved by row-level lock. > But I think that this problem should be addressed in any case: single > client update mode is very rare scenario. This behavior is explained in rules.sgml like this: +<sect2> +<title>Concurrent Transactions</title> +<para> + Suppose an <acronym>IMMV</acronym> is defined on two base tables and each + table was modified in different a concurrent transaction simultaneously. + In the transaction which was committed first, <acronym>IMMV</acronym> can + be updated considering only the change which happened in this transaction. + On the other hand, in order to update the view correctly in the transaction + which was committed later, we need to know the changes occurred in + both transactions. For this reason, <literal>ExclusiveLock</literal> + is held on an <acronym>IMMV</acronym> immediately after a base table is + modified in <literal>READ COMMITTED</literal> mode to make sure that + the <acronym>IMMV</acronym> is updated in the latter transaction after + the former transaction is committed. In <literal>REPEATABLE READ</literal> + or <literal>SERIALIZABLE</literal> mode, an error is raised immediately + if lock acquisition fails because any changes which occurred in + other transactions are not be visible in these modes and + <acronym>IMMV</acronym> cannot be updated correctly in such situations. +</para> +</sect2> Hoever, should we describe explicitly its impact on performance here? > I attached to this mail profile of pgbench workload with defined > incremental view (with index). > May be you will find it useful. Thank you for your profiling! Hmm, it shows that overhead of executing query for calculating the delta (refresh_mateview_datfill) and applying the delta (SPI_exec) is large.... I will investigate if more optimizations to reduce the overhead is possible. > > One more disappointing observation of materialized views (now > non-incremental). > Time of creation of non-incremental materialized view is about 18 seconds: > > postgres=# create materialized view teller_avgs as select > t.tid,avg(abalance) from pgbench_accounts a join pgbench_tellers t on > a.bid=t.bid group by t.tid; > SELECT 1000 > Time: 17795.395 ms (00:17.795) > > But refresh of such view takes 55 seconds: > > postgres=# refresh materialized view teller_avgs; > REFRESH MATERIALIZED VIEW > Time: 55500.381 ms (00:55.500) Hmm, interesting... I would like to investigate this issue, too. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
On 24.11.2020 12:21, Yugo NAGATA wrote: > >> I replaced it with RowExlusiveLock and ... got 1437 TPS with 10 connections. >> It is still about 7 times slower than performance without incremental view. >> But now the gap is not so dramatic. And it seems to be clear that this >> exclusive lock on matview is real show stopper for concurrent updates. >> I do not know which race conditions and anomalies we can get if replace >> table-level lock with row-level lock here. > I explained it here: > https://www.postgresql.org/message-id/20200909092752.c91758a1bec3479668e82643%40sraoss.co.jp > > For example, suppose there is a view V = R*S that joins tables R and S, > and there are two concurrent transactions T1 which changes table R to R' > and T2 which changes S to S'. Without any lock, in READ COMMITTED mode, > V would be updated to R'*S in T1, and R*S' in T2, so it would cause > inconsistency. By locking the view V, transactions T1, T2 are processed > serially and this inconsistency can be avoided. > > Especially, suppose that tuple dR is inserted into R in T1, and dS is > inserted into S in T2, where dR and dS will be joined in according to > the view definition. In this situation, without any lock, the change of V is > computed as dV=dR*S in T1, dV=R*dS in T2, respectively, and dR*dS would not > be included in the results. This inconsistency could not be resolved by > row-level lock. > >> But I think that this problem should be addressed in any case: single >> client update mode is very rare scenario. > This behavior is explained in rules.sgml like this: > > +<sect2> > +<title>Concurrent Transactions</title> > +<para> > + Suppose an <acronym>IMMV</acronym> is defined on two base tables and each > + table was modified in different a concurrent transaction simultaneously. > + In the transaction which was committed first, <acronym>IMMV</acronym> can > + be updated considering only the change which happened in this transaction. > + On the other hand, in order to update the view correctly in the transaction > + which was committed later, we need to know the changes occurred in > + both transactions. For this reason, <literal>ExclusiveLock</literal> > + is held on an <acronym>IMMV</acronym> immediately after a base table is > + modified in <literal>READ COMMITTED</literal> mode to make sure that > + the <acronym>IMMV</acronym> is updated in the latter transaction after > + the former transaction is committed. In <literal>REPEATABLE READ</literal> > + or <literal>SERIALIZABLE</literal> mode, an error is raised immediately > + if lock acquisition fails because any changes which occurred in > + other transactions are not be visible in these modes and > + <acronym>IMMV</acronym> cannot be updated correctly in such situations. > +</para> > +</sect2> > > Hoever, should we describe explicitly its impact on performance here? > Sorry, I didn't think much about this problem. But I think that it is very important to try to find some solution of the problem. The most obvious optimization is not to use exclusive table lock if view depends just on one table (contains no joins). Looks like there are no any anomalies in this case, are there? Yes, most analytic queries contain joins (just two queries among 22 TPC-H have no joins). So may be this optimization will not help much. I wonder if it is possible to somehow use predicate locking mechanism of Postgres to avoid this anomalies without global lock? -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Tue, 24 Nov 2020 12:46:57 +0300 Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > > > On 24.11.2020 12:21, Yugo NAGATA wrote: > > > >> I replaced it with RowExlusiveLock and ... got 1437 TPS with 10 connections. > >> It is still about 7 times slower than performance without incremental view. > >> But now the gap is not so dramatic. And it seems to be clear that this > >> exclusive lock on matview is real show stopper for concurrent updates. > >> I do not know which race conditions and anomalies we can get if replace > >> table-level lock with row-level lock here. > > I explained it here: > > https://www.postgresql.org/message-id/20200909092752.c91758a1bec3479668e82643%40sraoss.co.jp > > > > For example, suppose there is a view V = R*S that joins tables R and S, > > and there are two concurrent transactions T1 which changes table R to R' > > and T2 which changes S to S'. Without any lock, in READ COMMITTED mode, > > V would be updated to R'*S in T1, and R*S' in T2, so it would cause > > inconsistency. By locking the view V, transactions T1, T2 are processed > > serially and this inconsistency can be avoided. > > > > Especially, suppose that tuple dR is inserted into R in T1, and dS is > > inserted into S in T2, where dR and dS will be joined in according to > > the view definition. In this situation, without any lock, the change of V is > > computed as dV=dR*S in T1, dV=R*dS in T2, respectively, and dR*dS would not > > be included in the results. This inconsistency could not be resolved by > > row-level lock. > > > >> But I think that this problem should be addressed in any case: single > >> client update mode is very rare scenario. > > This behavior is explained in rules.sgml like this: > > > > +<sect2> > > +<title>Concurrent Transactions</title> > > +<para> > > + Suppose an <acronym>IMMV</acronym> is defined on two base tables and each > > + table was modified in different a concurrent transaction simultaneously. > > + In the transaction which was committed first, <acronym>IMMV</acronym> can > > + be updated considering only the change which happened in this transaction. > > + On the other hand, in order to update the view correctly in the transaction > > + which was committed later, we need to know the changes occurred in > > + both transactions. For this reason, <literal>ExclusiveLock</literal> > > + is held on an <acronym>IMMV</acronym> immediately after a base table is > > + modified in <literal>READ COMMITTED</literal> mode to make sure that > > + the <acronym>IMMV</acronym> is updated in the latter transaction after > > + the former transaction is committed. In <literal>REPEATABLE READ</literal> > > + or <literal>SERIALIZABLE</literal> mode, an error is raised immediately > > + if lock acquisition fails because any changes which occurred in > > + other transactions are not be visible in these modes and > > + <acronym>IMMV</acronym> cannot be updated correctly in such situations. > > +</para> > > +</sect2> > > > > Hoever, should we describe explicitly its impact on performance here? > > > > Sorry, I didn't think much about this problem. > But I think that it is very important to try to find some solution of > the problem. > The most obvious optimization is not to use exclusive table lock if view > depends just on one table (contains no joins). > Looks like there are no any anomalies in this case, are there? Thank you for your suggestion! That makes sense. > Yes, most analytic queries contain joins (just two queries among 22 > TPC-H have no joins). > So may be this optimization will not help much. Yes, but if a user want to incrementally maintain only aggregate views on a large table, like TPC-H Q1, it will be helpful. For this optimization, we have to only check the number of RTE in the rtable list and it would be cheap. > I wonder if it is possible to somehow use predicate locking mechanism of > Postgres to avoid this anomalies without global lock? You mean that, ,instead of using any table lock, if any possibility of the anomaly is detected using predlock mechanism then abort the transaction? I don't have concrete idea to implement it and know if it is possible yet, but I think it is worth to consider this. Thanks. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
On 24.11.2020 13:11, Yugo NAGATA wrote: > >> I wonder if it is possible to somehow use predicate locking mechanism of >> Postgres to avoid this anomalies without global lock? > You mean that, ,instead of using any table lock, if any possibility of the > anomaly is detected using predlock mechanism then abort the transaction? Yes. If both transactions are using serializable isolation level, then lock is not needed, isn't it? So at least you can add yet another simple optimization: if transaction has serializable isolation level, then exclusive lock is not required. But I wonder if we can go further so that even if transaction is using read-committed or repeatable-read isolation level, we still can replace exclusive table lock with predicate locks. The main problem with this approach (from my point of view) is the predicate locks are able to detect conflict but not able to prevent it. I.e. if such conflict is detected then transaction has to be aborted. And it is not always desirable, especially because user doesn't expect it: how can insertion of single record with unique keys in a table cause transaction conflict? And this is what will happen in your example with transactions T1 and T2 inserting records in R and S tables. And what do you think about backrgound update of materialized view? On update/insert trigger will just add record to some "delta" table and then some background worker will update view. Certainly in this case we loose synchronization between main table and materialized view (last one may contain slightly deteriorated data). But in this case no exclusive lock is needed, isn't it? -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Wed, 25 Nov 2020 15:16:05 +0300 Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > > > On 24.11.2020 13:11, Yugo NAGATA wrote: > > > >> I wonder if it is possible to somehow use predicate locking mechanism of > >> Postgres to avoid this anomalies without global lock? > > You mean that, ,instead of using any table lock, if any possibility of the > > anomaly is detected using predlock mechanism then abort the transaction? > > Yes. If both transactions are using serializable isolation level, then > lock is not needed, isn't it? > So at least you can add yet another simple optimization: if transaction > has serializable isolation level, > then exclusive lock is not required. As long as we use the trigger approach, we can't handle concurrent view maintenance in either repeatable read or serializable isolation level. It is because one transaction (R= R+dR) cannot see changes occurred in another transaction (S'= S+dS) in such cases, and we cannot get the incremental change on the view (dV=dR*dS). Therefore, in the current implementation, the transaction is aborted when the concurrent view maintenance happens in repeatable read or serializable. > But I wonder if we can go further so that even if transaction is using > read-committed or repeatable-read isolation level, > we still can replace exclusive table lock with predicate locks. > > The main problem with this approach (from my point of view) is the > predicate locks are able to detect conflict but not able to prevent it. > I.e. if such conflict is detected then transaction has to be aborted. > And it is not always desirable, especially because user doesn't expect > it: how can insertion of single record with unique keys in a table cause > transaction conflict? > And this is what will happen in your example with transactions T1 and T2 > inserting records in R and S tables. Yes. I wonder that either aborting transaction or waiting on locks is unavoidable when a view is incrementally updated concurrently (at least in the immediate maintenance where a view is update in the same transaction that updates the base table). > And what do you think about backrgound update of materialized view? > On update/insert trigger will just add record to some "delta" table and > then some background worker will update view. > Certainly in this case we loose synchronization between main table and > materialized view (last one may contain slightly deteriorated data). > But in this case no exclusive lock is needed, isn't it? Of course, we are considering this type of view maintenance. This is deferred maintenance where a view is update after the transaction that updates the base tables is committed. Views can be updated in bacground in a appropreate timing or as a response of a user command. To implement this, we needs a mechanism to maintain change logs which records changes of base tables. We think that implementing this infrastructure is not trivial work, so, in the first patch proposal, we decided to start from immediate approach which needs less code. -- Yugo NAGATA <nagata@sraoss.co.jp>
On 25.11.2020 16:06, Yugo NAGATA wrote: > On Wed, 25 Nov 2020 15:16:05 +0300 > Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > >> >> On 24.11.2020 13:11, Yugo NAGATA wrote: >>>> I wonder if it is possible to somehow use predicate locking mechanism of >>>> Postgres to avoid this anomalies without global lock? >>> You mean that, ,instead of using any table lock, if any possibility of the >>> anomaly is detected using predlock mechanism then abort the transaction? >> Yes. If both transactions are using serializable isolation level, then >> lock is not needed, isn't it? >> So at least you can add yet another simple optimization: if transaction >> has serializable isolation level, >> then exclusive lock is not required. > As long as we use the trigger approach, we can't handle concurrent view maintenance > in either repeatable read or serializable isolation level. It is because one > transaction (R= R+dR) cannot see changes occurred in another transaction (S'= S+dS) > in such cases, and we cannot get the incremental change on the view (dV=dR*dS). > Therefore, in the current implementation, the transaction is aborted when the > concurrent view maintenance happens in repeatable read or serializable. Sorry, may be I do not correctly understand you or you do not understand me. Lets consider two serializable transactions (I do not use view or triggers, but perform correspondent updates manually): create table t(pk integer, val int); create table mat_view(gby_key integer primary key, total bigint); insert into t values (1,0),(2,0); insert into mat_view values (1,0),(2,0); Session 1: Session 2: begin isolation level serializable; begin isolation level serializable; insert into t values (1,200); insert into t values (1,300); update mat_view set total=total+200 where gby_key=1; update mat_view set total=total+300 where gby_key=1; <blocked> commit; ERROR: could not serialize access due to concurrent update So both transactions are aborted. It is expected behavior for serializable transactions. But if transactions updating different records of mat_view, then them can be executed concurrently: Session 1: Session 2: begin isolation level serializable; begin isolation level serializable; insert into t values (1,200); insert into t values (2,300); update mat_view set total=total+200 where gby_key=1; update mat_view set total=total+300 where gby_key=2; commit; commit; So, if transactions are using serializable isolation level, then we can update mat view without exclusive lock and if there is not conflict, this transaction can be executed concurrently. Please notice, that exclusive lock doesn't prevent conflict in first case: Session 1: Session 2: begin isolation level serializable; begin isolation level serializable; insert into t values (1,200); insert into t values (1,300); lock table mat_view; update mat_view set total=total+200 where gby_key=1; lock table mat_view; <blocked> commit; update mat_view set total=total+300 where gby_key=1; commit; ERROR: could not serialize access due to concurrent update So do you agree that there are no reasons for using explicit lock for serializable transactions? -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Wed, 25 Nov 2020 18:00:16 +0300 Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > > > On 25.11.2020 16:06, Yugo NAGATA wrote: > > On Wed, 25 Nov 2020 15:16:05 +0300 > > Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > > > >> > >> On 24.11.2020 13:11, Yugo NAGATA wrote: > >>>> I wonder if it is possible to somehow use predicate locking mechanism of > >>>> Postgres to avoid this anomalies without global lock? > >>> You mean that, ,instead of using any table lock, if any possibility of the > >>> anomaly is detected using predlock mechanism then abort the transaction? > >> Yes. If both transactions are using serializable isolation level, then > >> lock is not needed, isn't it? > >> So at least you can add yet another simple optimization: if transaction > >> has serializable isolation level, > >> then exclusive lock is not required. > > As long as we use the trigger approach, we can't handle concurrent view maintenance > > in either repeatable read or serializable isolation level. It is because one > > transaction (R= R+dR) cannot see changes occurred in another transaction (S'= S+dS) > > in such cases, and we cannot get the incremental change on the view (dV=dR*dS). > > Therefore, in the current implementation, the transaction is aborted when the > > concurrent view maintenance happens in repeatable read or serializable. > > Sorry, may be I do not correctly understand you or you do not understand me. > Lets consider two serializable transactions (I do not use view or > triggers, but perform correspondent updates manually): > > > > create table t(pk integer, val int); > create table mat_view(gby_key integer primary key, total bigint); > insert into t values (1,0),(2,0); > insert into mat_view values (1,0),(2,0); > > Session 1: Session 2: > > begin isolation level serializable; > begin isolation level serializable; > insert into t values (1,200); insert into t > values (1,300); > update mat_view set total=total+200 where gby_key=1; > update mat_view set total=total+300 where gby_key=1; > <blocked> > commit; > ERROR: could not serialize access due to concurrent update > > So both transactions are aborted. > It is expected behavior for serializable transactions. > But if transactions updating different records of mat_view, then them > can be executed concurrently: > > Session 1: Session 2: > > begin isolation level serializable; > begin isolation level serializable; > insert into t values (1,200); insert into t > values (2,300); > update mat_view set total=total+200 where gby_key=1; > update mat_view set total=total+300 where gby_key=2; > commit; commit; > > So, if transactions are using serializable isolation level, then we can > update mat view without exclusive lock > and if there is not conflict, this transaction can be executed concurrently. > > Please notice, that exclusive lock doesn't prevent conflict in first case: > > Session 1: Session 2: > > begin isolation level serializable; > begin isolation level serializable; > insert into t values (1,200); insert into t > values (1,300); > lock table mat_view; > update mat_view set total=total+200 where gby_key=1; > lock table mat_view; > <blocked> > commit; > update mat_view set total=total+300 where gby_key=1; > commit; > ERROR: could not serialize access due to concurrent update > > > So do you agree that there are no reasons for using explicit lock for > serializable transactions? Yes, I agree. I said an anomaly could occur in repeatable read and serializable isolation level, but it was wrong. In serializable, the transaction will be aborted in programmable cases due to predicate locks, and we don't need the lock. However, in repeatable read, the anomaly still could occurs when the table is defined on more than one base tables even if we lock the view. To prevent it, the only way I found is aborting the transaction forcedly in such cases for now. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Hi, Attached is the revised patch (v20) to add support for Incremental Materialized View Maintenance (IVM). In according with Konstantin's suggestion, I made a few optimizations. 1. Creating an index on the matview automatically When creating incremental maintainable materialized view (IMMV)s, a unique index on IMMV is created automatically if possible. If the view definition query has a GROUP BY clause, the index is created on the columns of GROUP BY expressions. Otherwise, if the view contains all primary key attributes of its base tables in the target list, the index is created on these attributes. Also, if the view has DISTINCT, a unique index is created on all columns in the target list. In other cases, no index is created. In all cases, a NOTICE message is output to inform users that an index is created or that an appropriate index is necessary for efficient IVM. 2. Use a weaker lock on the matview if possible If the view has only one base table in this query, RowExclusiveLock is held on the view instead of AccessExclusiveLock, because we don't need to wait other concurrent transaction's result in order to maintain the view in this case. When the same row in the view is affected due to concurrent maintenances, a row level lock will protect it. On Tue, 24 Nov 2020 12:46:57 +0300 Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > The most obvious optimization is not to use exclusive table lock if view > depends just on one table (contains no joins). > Looks like there are no any anomalies in this case, are there? I confirmed the effect of this optimizations. First, when I performed pgbench (SF=100) without any materialized views, the results is : pgbench test4 -T 300 -c 8 -j 4 latency average = 6.493 ms tps = 1232.146229 (including connections establishing) Next, created a view as below, I performed the same pgbench. CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm2 AS SELECT bid, count(abalance), sum(abalance), avg(abalance) FROM pgbench_accounts GROUP BY bid; The result is here: [the previous version (v19 with exclusive table lock)] - latency average = 77.677 ms - tps = 102.990159 (including connections establishing) [In the latest version (v20 with weaker lock)] - latency average = 17.576 ms - tps = 455.159644 (including connections establishing) There is still substantial overhead, but we can see that the effect of the optimization. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
Hi hackers, I heard the opinion that this patch is too big and hard to review. So, I wander that we should downsize the patch by eliminating some features and leaving other basic features. If there are more opinions this makes it easer for reviewers to look at this patch, I would like do it. If so, we plan to support only selection, projection, inner-join, and some aggregates in the first release and leave sub-queries, outer-join, and CTE supports to the next release. Regards, Yugo Nagata On Tue, 22 Dec 2020 21:51:36 +0900 Yugo NAGATA <nagata@sraoss.co.jp> wrote: > Hi, > > Attached is the revised patch (v20) to add support for Incremental > Materialized View Maintenance (IVM). > > In according with Konstantin's suggestion, I made a few optimizations. > > 1. Creating an index on the matview automatically > > When creating incremental maintainable materialized view (IMMV)s, > a unique index on IMMV is created automatically if possible. > > If the view definition query has a GROUP BY clause, the index is created > on the columns of GROUP BY expressions. Otherwise, if the view contains > all primary key attributes of its base tables in the target list, the index > is created on these attributes. Also, if the view has DISTINCT, > a unique index is created on all columns in the target list. > In other cases, no index is created. > > In all cases, a NOTICE message is output to inform users that an index is > created or that an appropriate index is necessary for efficient IVM. > > 2. Use a weaker lock on the matview if possible > > If the view has only one base table in this query, RowExclusiveLock is > held on the view instead of AccessExclusiveLock, because we don't > need to wait other concurrent transaction's result in order to > maintain the view in this case. When the same row in the view is > affected due to concurrent maintenances, a row level lock will > protect it. > > On Tue, 24 Nov 2020 12:46:57 +0300 > Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > > > The most obvious optimization is not to use exclusive table lock if view > > depends just on one table (contains no joins). > > Looks like there are no any anomalies in this case, are there? > > I confirmed the effect of this optimizations. > > First, when I performed pgbench (SF=100) without any materialized views, > the results is : > > pgbench test4 -T 300 -c 8 -j 4 > latency average = 6.493 ms > tps = 1232.146229 (including connections establishing) > > Next, created a view as below, I performed the same pgbench. > CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm2 AS > SELECT bid, count(abalance), sum(abalance), avg(abalance) > FROM pgbench_accounts GROUP BY bid; > > The result is here: > > [the previous version (v19 with exclusive table lock)] > - latency average = 77.677 ms > - tps = 102.990159 (including connections establishing) > > [In the latest version (v20 with weaker lock)] > - latency average = 17.576 ms > - tps = 455.159644 (including connections establishing) > > There is still substantial overhead, but we can see that the effect > of the optimization. > > Regards, > Yugo Nagata > > -- > Yugo NAGATA <nagata@sraoss.co.jp> -- Yugo NAGATA <nagata@sraoss.co.jp>
Hi Yugo, > 1. Creating an index on the matview automatically Nice. > 2. Use a weaker lock on the matview if possible > > If the view has only one base table in this query, RowExclusiveLock is > held on the view instead of AccessExclusiveLock, because we don't > need to wait other concurrent transaction's result in order to > maintain the view in this case. When the same row in the view is > affected due to concurrent maintenances, a row level lock will > protect it. > > On Tue, 24 Nov 2020 12:46:57 +0300 > Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > >> The most obvious optimization is not to use exclusive table lock if view >> depends just on one table (contains no joins). >> Looks like there are no any anomalies in this case, are there? > > I confirmed the effect of this optimizations. > > First, when I performed pgbench (SF=100) without any materialized views, > the results is : > > pgbench test4 -T 300 -c 8 -j 4 > latency average = 6.493 ms > tps = 1232.146229 (including connections establishing) > > Next, created a view as below, I performed the same pgbench. > CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm2 AS > SELECT bid, count(abalance), sum(abalance), avg(abalance) > FROM pgbench_accounts GROUP BY bid; > > The result is here: > > [the previous version (v19 with exclusive table lock)] > - latency average = 77.677 ms > - tps = 102.990159 (including connections establishing) > > [In the latest version (v20 with weaker lock)] > - latency average = 17.576 ms > - tps = 455.159644 (including connections establishing) > > There is still substantial overhead, but we can see that the effect > of the optimization. Great improvement. Now with this patch more than 4x faster than previous one! Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Hi, Attached is the revised patch (v21) to add support for Incremental Materialized View Maintenance (IVM). In addition to some typos in the previous enhancement, I fixed a check to prevent a view from containing an expression including aggregates like sum(x)/sum(y) in this revision. Regards, Yugo Nagata On Tue, 22 Dec 2020 22:24:22 +0900 Yugo NAGATA <nagata@sraoss.co.jp> wrote: > Hi hackers, > > I heard the opinion that this patch is too big and hard to review. > So, I wander that we should downsize the patch by eliminating some > features and leaving other basic features. > > If there are more opinions this makes it easer for reviewers to look > at this patch, I would like do it. If so, we plan to support only > selection, projection, inner-join, and some aggregates in the first > release and leave sub-queries, outer-join, and CTE supports to the > next release. > > Regards, > Yugo Nagata > > On Tue, 22 Dec 2020 21:51:36 +0900 > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > Hi, > > > > Attached is the revised patch (v20) to add support for Incremental > > Materialized View Maintenance (IVM). > > > > In according with Konstantin's suggestion, I made a few optimizations. > > > > 1. Creating an index on the matview automatically > > > > When creating incremental maintainable materialized view (IMMV)s, > > a unique index on IMMV is created automatically if possible. > > > > If the view definition query has a GROUP BY clause, the index is created > > on the columns of GROUP BY expressions. Otherwise, if the view contains > > all primary key attributes of its base tables in the target list, the index > > is created on these attributes. Also, if the view has DISTINCT, > > a unique index is created on all columns in the target list. > > In other cases, no index is created. > > > > In all cases, a NOTICE message is output to inform users that an index is > > created or that an appropriate index is necessary for efficient IVM. > > > > 2. Use a weaker lock on the matview if possible > > > > If the view has only one base table in this query, RowExclusiveLock is > > held on the view instead of AccessExclusiveLock, because we don't > > need to wait other concurrent transaction's result in order to > > maintain the view in this case. When the same row in the view is > > affected due to concurrent maintenances, a row level lock will > > protect it. > > > > On Tue, 24 Nov 2020 12:46:57 +0300 > > Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > > > > > The most obvious optimization is not to use exclusive table lock if view > > > depends just on one table (contains no joins). > > > Looks like there are no any anomalies in this case, are there? > > > > I confirmed the effect of this optimizations. > > > > First, when I performed pgbench (SF=100) without any materialized views, > > the results is : > > > > pgbench test4 -T 300 -c 8 -j 4 > > latency average = 6.493 ms > > tps = 1232.146229 (including connections establishing) > > > > Next, created a view as below, I performed the same pgbench. > > CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm2 AS > > SELECT bid, count(abalance), sum(abalance), avg(abalance) > > FROM pgbench_accounts GROUP BY bid; > > > > The result is here: > > > > [the previous version (v19 with exclusive table lock)] > > - latency average = 77.677 ms > > - tps = 102.990159 (including connections establishing) > > > > [In the latest version (v20 with weaker lock)] > > - latency average = 17.576 ms > > - tps = 455.159644 (including connections establishing) > > > > There is still substantial overhead, but we can see that the effect > > of the optimization. > > > > Regards, > > Yugo Nagata > > > > -- > > Yugo NAGATA <nagata@sraoss.co.jp> > > > -- > Yugo NAGATA <nagata@sraoss.co.jp> > > -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
Hi, Attached is a revised patch (v22) rebased for the latest master head. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
Hi, Attached is a rebased patch (v22a). Ragards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
Hi,
Attached is a rebased patch (v22a).
On Thu, 18 Feb 2021 19:38:44 +0800 Andy Fan <zhihui.fan1213@gmail.com> wrote: > On Tue, Feb 16, 2021 at 9:33 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > Hi, > > > > Attached is a rebased patch (v22a). > > > > Thanks for the patch. Will you think posting a patch with the latest commit > at that > time is helpful? If so, when others want to review it, they know which > commit to > apply the patch without asking for a new rebase usually. I rebased the patch because cfbot failed. http://cfbot.cputube.org/ Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Hi, I rebased the patch because the cfbot failed. Regards, Yugo Nagata On Tue, 9 Mar 2021 17:27:50 +0900 Yugo NAGATA <nagata@sraoss.co.jp> wrote: > On Tue, 9 Mar 2021 09:20:49 +0900 > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > On Mon, 8 Mar 2021 15:42:00 -0500 > > Andrew Dunstan <andrew@dunslane.net> wrote: > > > > > > > > On 2/18/21 9:01 PM, Yugo NAGATA wrote: > > > > On Thu, 18 Feb 2021 19:38:44 +0800 > > > > Andy Fan <zhihui.fan1213@gmail.com> wrote: > > > > > > > >> On Tue, Feb 16, 2021 at 9:33 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > >> > > > >>> Hi, > > > >>> > > > >>> Attached is a rebased patch (v22a). > > > >>> > > > >> Thanks for the patch. Will you think posting a patch with the latest commit > > > >> at that > > > >> time is helpful? If so, when others want to review it, they know which > > > >> commit to > > > >> apply the patch without asking for a new rebase usually. > > > > I rebased the patch because cfbot failed. > > > > http://cfbot.cputube.org/ > > > > > > > > > > It's bitrotted a bit more dues to commits bb437f995d and 25936fd46c > > > > Thank you for letting me konw. I'll rebase it soon. > > Done. Attached is a rebased patch set. > > Regards, > Yugo Nagata > > -- > Yugo NAGATA <nagata@sraoss.co.jp> -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
On 4/7/21 5:25 AM, Yugo NAGATA wrote: > Hi, > > I rebased the patch because the cfbot failed. > > Regards, > Yugo Nagata This patch (v22c) just crashed for me with an assertion failure on Fedora 31. Here's the stack trace: [New LWP 333090] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `postgres: andrew regression [local] INSERT '. Program terminated with signal SIGABRT, Aborted. #0 0x00007f8981caa625 in raise () from /lib64/libc.so.6 #0 0x00007f8981caa625 in raise () from /lib64/libc.so.6 #1 0x00007f8981c938d9 in abort () from /lib64/libc.so.6 #2 0x000000000094a54a in ExceptionalCondition (conditionName=conditionName@entry=0xa91dae "queryDesc->sourceText != NULL", errorType=errorType@entry=0x99b468 "FailedAssertion", fileName=fileName@entry=0xa91468 "/home/andrew/pgl/pg_head/src/backend/executor/execMain.c", lineNumber=lineNumber@entry=199) at /home/andrew/pgl/pg_head/src/backend/utils/error/assert.c:69 #3 0x00000000006c0e17 in standard_ExecutorStart (queryDesc=0x226af98, eflags=0) at /home/andrew/pgl/pg_head/src/backend/executor/execMain.c:199 #4 0x00000000006737b2 in refresh_matview_datafill (dest=0x21cf428, query=<optimized out>, queryEnv=0x2245fd0, resultTupleDesc=0x7ffd5e764888, queryString=0x0) at /home/andrew/pgl/pg_head/src/backend/commands/matview.c:719 #5 0x0000000000678042 in calc_delta (queryEnv=0x2245fd0, tupdesc_new=0x7ffd5e764888, tupdesc_old=0x7ffd5e764880, dest_new=0x21cf428, dest_old=0x0, query=0x2246108, rte_path=0x2228a60, table=<optimized out>) at /home/andrew/pgl/pg_head/src/backend/commands/matview.c:2907 #6 IVM_immediate_maintenance (fcinfo=<optimized out>) at /home/andrew/pgl/pg_head/src/backend/commands/matview.c:1683 #7 0x000000000069e483 in ExecCallTriggerFunc (trigdata=0x7ffd5e764bb0, tgindx=2, finfo=0x22345f8, instr=0x0, per_tuple_context=0x2245eb0) at /home/andrew/pgl/pg_head/src/backend/commands/trigger.c:2142 #8 0x000000000069fc4c in AfterTriggerExecute (trigdesc=0x2233db8, trigdesc=0x2233db8, trig_tuple_slot2=0x0, trig_tuple_slot1=0x0, per_tuple_context=0x2245eb0, instr=0x0, finfo=0x2234598, relInfo=0x2233ba0, event=0x222d380, estate=0x2233710) at /home/andrew/pgl/pg_head/src/backend/commands/trigger.c:4041 #9 afterTriggerInvokeEvents (events=0x21cece8, firing_id=1, estate=0x2233710, delete_ok=false) at /home/andrew/pgl/pg_head/src/backend/commands/trigger.c:4255 #10 0x00000000006a4173 in AfterTriggerEndQuery (estate=estate@entry=0x2233710) at /home/andrew/pgl/pg_head/src/backend/commands/trigger.c:4632 #11 0x00000000006c04c8 in standard_ExecutorFinish (queryDesc=0x2237300) at /home/andrew/pgl/pg_head/src/backend/executor/execMain.c:436 #12 0x00000000008415d8 in ProcessQuery (plan=<optimized out>, sourceText=0x21490a0 "INSERT INTO mv_base_b VALUES(5,105);", params=0x0, queryEnv=0x0, dest=0x2221010, qc=0x7ffd5e764f00) at /home/andrew/pgl/pg_head/src/backend/tcop/pquery.c:190 #13 0x00000000008417f2 in PortalRunMulti (portal=portal@entry=0x21ac3c0, isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x2221010, altdest=altdest@entry=0x2221010, qc=qc@entry=0x7ffd5e764f00) at /home/andrew/pgl/pg_head/src/backend/tcop/pquery.c:1267 #14 0x0000000000842415 in PortalRun (portal=portal@entry=0x21ac3c0, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x2221010, altdest=altdest@entry=0x2221010, qc=0x7ffd5e764f00) at /home/andrew/pgl/pg_head/src/backend/tcop/pquery.c:779 #15 0x000000000083e3ca in exec_simple_query (query_string=0x21490a0 "INSERT INTO mv_base_b VALUES(5,105);") at /home/andrew/pgl/pg_head/src/backend/tcop/postgres.c:1196 #16 0x0000000000840075 in PostgresMain (argc=argc@entry=1, argv=argv@entry=0x7ffd5e765450, dbname=<optimized out>, username=<optimized out>) at /home/andrew/pgl/pg_head/src/backend/tcop/postgres.c:4458 #17 0x00000000007b8054 in BackendRun (port=<optimized out>, port=<optimized out>) at /home/andrew/pgl/pg_head/src/backend/postmaster/postmaster.c:4488 #18 BackendStartup (port=<optimized out>) at /home/andrew/pgl/pg_head/src/backend/postmaster/postmaster.c:4210 #19 ServerLoop () at /home/andrew/pgl/pg_head/src/backend/postmaster/postmaster.c:1742 #20 0x00000000007b8ebf in PostmasterMain (argc=argc@entry=8, argv=argv@entry=0x21435c0) at /home/andrew/pgl/pg_head/src/backend/postmaster/postmaster.c:1414 #21 0x000000000050e030 in main (argc=8, argv=0x21435c0) at /home/andrew/pgl/pg_head/src/backend/main/main.c:209 $1 = {si_signo = 6, si_errno = 0, si_code = -6, _sifields = {_pad = {333090, 500, 0 <repeats 26 times>}, _kill = {si_pid = 333090, si_uid = 500}, _timer = {si_tid = 333090, si_overrun = 500, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _rt = {si_pid = 333090, si_uid = 500, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _sigchld = {si_pid = 333090, si_uid = 500, si_status = 0, si_utime = 0, si_stime = 0}, _sigfault = {si_addr = 0x1f400051522, _addr_lsb = 0, _addr_bnd = {_lower = 0x0, _upper = 0x0}}, _sigpoll = {si_band = 2147483981090, si_fd = 0}}} cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Andrew Dunstan <andrew@dunslane.net> writes: > This patch (v22c) just crashed for me with an assertion failure on > Fedora 31. Here's the stack trace: > #2 0x000000000094a54a in ExceptionalCondition > (conditionName=conditionName@entry=0xa91dae "queryDesc->sourceText != > NULL", errorType=errorType@entry=0x99b468 "FailedAssertion", > fileName=fileName@entry=0xa91468 > "/home/andrew/pgl/pg_head/src/backend/executor/execMain.c", > lineNumber=lineNumber@entry=199) at > /home/andrew/pgl/pg_head/src/backend/utils/error/assert.c:69 That assert just got added a few days ago, so that's why the patch seemed OK before. regards, tom lane
On Mon, 19 Apr 2021 17:40:31 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > > This patch (v22c) just crashed for me with an assertion failure on > > Fedora 31. Here's the stack trace: > > > #2 0x000000000094a54a in ExceptionalCondition > > (conditionName=conditionName@entry=0xa91dae "queryDesc->sourceText != > > NULL", errorType=errorType@entry=0x99b468 "FailedAssertion", > > fileName=fileName@entry=0xa91468 > > "/home/andrew/pgl/pg_head/src/backend/executor/execMain.c", > > lineNumber=lineNumber@entry=199) at > > /home/andrew/pgl/pg_head/src/backend/utils/error/assert.c:69 > > That assert just got added a few days ago, so that's why the patch > seemed OK before. Thank you for letting me know. I'll fix it. -- Yugo NAGATA <nagata@sraoss.co.jp>
On Tue, 20 Apr 2021 09:51:34 +0900 Yugo NAGATA <nagata@sraoss.co.jp> wrote: > On Mon, 19 Apr 2021 17:40:31 -0400 > Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > Andrew Dunstan <andrew@dunslane.net> writes: > > > This patch (v22c) just crashed for me with an assertion failure on > > > Fedora 31. Here's the stack trace: > > > > > #2 0x000000000094a54a in ExceptionalCondition > > > (conditionName=conditionName@entry=0xa91dae "queryDesc->sourceText != > > > NULL", errorType=errorType@entry=0x99b468 "FailedAssertion", > > > fileName=fileName@entry=0xa91468 > > > "/home/andrew/pgl/pg_head/src/backend/executor/execMain.c", > > > lineNumber=lineNumber@entry=199) at > > > /home/andrew/pgl/pg_head/src/backend/utils/error/assert.c:69 > > > > That assert just got added a few days ago, so that's why the patch > > seemed OK before. > > Thank you for letting me know. I'll fix it. Attached is the fixed patch. queryDesc->sourceText cannot be NULL after commit 1111b2668d8, so now we pass an empty string "" for refresh_matview_datafill() instead NULL when maintaining views incrementally. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
On Mon, 26 Apr 2021 15:46:21 +0900 Yugo NAGATA <nagata@sraoss.co.jp> wrote: > On Tue, 20 Apr 2021 09:51:34 +0900 > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > On Mon, 19 Apr 2021 17:40:31 -0400 > > Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > > Andrew Dunstan <andrew@dunslane.net> writes: > > > > This patch (v22c) just crashed for me with an assertion failure on > > > > Fedora 31. Here's the stack trace: > > > > > > > #2 0x000000000094a54a in ExceptionalCondition > > > > (conditionName=conditionName@entry=0xa91dae "queryDesc->sourceText != > > > > NULL", errorType=errorType@entry=0x99b468 "FailedAssertion", > > > > fileName=fileName@entry=0xa91468 > > > > "/home/andrew/pgl/pg_head/src/backend/executor/execMain.c", > > > > lineNumber=lineNumber@entry=199) at > > > > /home/andrew/pgl/pg_head/src/backend/utils/error/assert.c:69 > > > > > > That assert just got added a few days ago, so that's why the patch > > > seemed OK before. > > > > Thank you for letting me know. I'll fix it. > > Attached is the fixed patch. > > queryDesc->sourceText cannot be NULL after commit 1111b2668d8, > so now we pass an empty string "" for refresh_matview_datafill() instead NULL > when maintaining views incrementally. I am sorry, I forgot to include a fix for 8aba9322511. Attached is the fixed version. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
On Mon, 26 Apr 2021 16:03:48 +0900 Yugo NAGATA <nagata@sraoss.co.jp> wrote: > On Mon, 26 Apr 2021 15:46:21 +0900 > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > On Tue, 20 Apr 2021 09:51:34 +0900 > > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > > On Mon, 19 Apr 2021 17:40:31 -0400 > > > Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > > > > Andrew Dunstan <andrew@dunslane.net> writes: > > > > > This patch (v22c) just crashed for me with an assertion failure on > > > > > Fedora 31. Here's the stack trace: > > > > > > > > > #2 0x000000000094a54a in ExceptionalCondition > > > > > (conditionName=conditionName@entry=0xa91dae "queryDesc->sourceText != > > > > > NULL", errorType=errorType@entry=0x99b468 "FailedAssertion", > > > > > fileName=fileName@entry=0xa91468 > > > > > "/home/andrew/pgl/pg_head/src/backend/executor/execMain.c", > > > > > lineNumber=lineNumber@entry=199) at > > > > > /home/andrew/pgl/pg_head/src/backend/utils/error/assert.c:69 > > > > > > > > That assert just got added a few days ago, so that's why the patch > > > > seemed OK before. > > > > > > Thank you for letting me know. I'll fix it. > > > > Attached is the fixed patch. > > > > queryDesc->sourceText cannot be NULL after commit 1111b2668d8, > > so now we pass an empty string "" for refresh_matview_datafill() instead NULL > > when maintaining views incrementally. > > I am sorry, I forgot to include a fix for 8aba9322511. > Attached is the fixed version. Attached is the rebased patch (for 6b8d29419d). -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
On Fri, 7 May 2021 14:14:16 +0900 Yugo NAGATA <nagata@sraoss.co.jp> wrote: > On Mon, 26 Apr 2021 16:03:48 +0900 > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > On Mon, 26 Apr 2021 15:46:21 +0900 > > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > > On Tue, 20 Apr 2021 09:51:34 +0900 > > > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > > > > On Mon, 19 Apr 2021 17:40:31 -0400 > > > > Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > > > > > > Andrew Dunstan <andrew@dunslane.net> writes: > > > > > > This patch (v22c) just crashed for me with an assertion failure on > > > > > > Fedora 31. Here's the stack trace: > > > > > > > > > > > #2 0x000000000094a54a in ExceptionalCondition > > > > > > (conditionName=conditionName@entry=0xa91dae "queryDesc->sourceText != > > > > > > NULL", errorType=errorType@entry=0x99b468 "FailedAssertion", > > > > > > fileName=fileName@entry=0xa91468 > > > > > > "/home/andrew/pgl/pg_head/src/backend/executor/execMain.c", > > > > > > lineNumber=lineNumber@entry=199) at > > > > > > /home/andrew/pgl/pg_head/src/backend/utils/error/assert.c:69 > > > > > > > > > > That assert just got added a few days ago, so that's why the patch > > > > > seemed OK before. > > > > > > > > Thank you for letting me know. I'll fix it. > > > > > > Attached is the fixed patch. > > > > > > queryDesc->sourceText cannot be NULL after commit 1111b2668d8, > > > so now we pass an empty string "" for refresh_matview_datafill() instead NULL > > > when maintaining views incrementally. > > > > I am sorry, I forgot to include a fix for 8aba9322511. > > Attached is the fixed version. > > Attached is the rebased patch (for 6b8d29419d). I attached a rebased patch. -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
On Mon, May 17, 2021 at 10:08 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > On Fri, 7 May 2021 14:14:16 +0900 > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > On Mon, 26 Apr 2021 16:03:48 +0900 > > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > > On Mon, 26 Apr 2021 15:46:21 +0900 > > > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > > > > On Tue, 20 Apr 2021 09:51:34 +0900 > > > > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > > > > > > On Mon, 19 Apr 2021 17:40:31 -0400 > > > > > Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > > > > > > > > Andrew Dunstan <andrew@dunslane.net> writes: > > > > > > > This patch (v22c) just crashed for me with an assertion failure on > > > > > > > Fedora 31. Here's the stack trace: > > > > > > > > > > > > > #2 0x000000000094a54a in ExceptionalCondition > > > > > > > (conditionName=conditionName@entry=0xa91dae "queryDesc->sourceText != > > > > > > > NULL", errorType=errorType@entry=0x99b468 "FailedAssertion", > > > > > > > fileName=fileName@entry=0xa91468 > > > > > > > "/home/andrew/pgl/pg_head/src/backend/executor/execMain.c", > > > > > > > lineNumber=lineNumber@entry=199) at > > > > > > > /home/andrew/pgl/pg_head/src/backend/utils/error/assert.c:69 > > > > > > > > > > > > That assert just got added a few days ago, so that's why the patch > > > > > > seemed OK before. > > > > > > > > > > Thank you for letting me know. I'll fix it. > > > > > > > > Attached is the fixed patch. > > > > > > > > queryDesc->sourceText cannot be NULL after commit 1111b2668d8, > > > > so now we pass an empty string "" for refresh_matview_datafill() instead NULL > > > > when maintaining views incrementally. > > > > > > I am sorry, I forgot to include a fix for 8aba9322511. > > > Attached is the fixed version. > > > > Attached is the rebased patch (for 6b8d29419d). > > I attached a rebased patch. The patch does not apply on Head anymore, could you rebase and post a patch. I'm changing the status to "Waiting for Author". Regards, Vignesh
On Wed, 14 Jul 2021 21:22:37 +0530 vignesh C <vignesh21@gmail.com> wrote: > On Mon, May 17, 2021 at 10:08 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > On Fri, 7 May 2021 14:14:16 +0900 > > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > > On Mon, 26 Apr 2021 16:03:48 +0900 > > > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > > > > On Mon, 26 Apr 2021 15:46:21 +0900 > > > > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > > > > > > On Tue, 20 Apr 2021 09:51:34 +0900 > > > > > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > > > > > > > > On Mon, 19 Apr 2021 17:40:31 -0400 > > > > > > Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > > > > > > > > > > Andrew Dunstan <andrew@dunslane.net> writes: > > > > > > > > This patch (v22c) just crashed for me with an assertion failure on > > > > > > > > Fedora 31. Here's the stack trace: > > > > > > > > > > > > > > > #2 0x000000000094a54a in ExceptionalCondition > > > > > > > > (conditionName=conditionName@entry=0xa91dae "queryDesc->sourceText != > > > > > > > > NULL", errorType=errorType@entry=0x99b468 "FailedAssertion", > > > > > > > > fileName=fileName@entry=0xa91468 > > > > > > > > "/home/andrew/pgl/pg_head/src/backend/executor/execMain.c", > > > > > > > > lineNumber=lineNumber@entry=199) at > > > > > > > > /home/andrew/pgl/pg_head/src/backend/utils/error/assert.c:69 > > > > > > > > > > > > > > That assert just got added a few days ago, so that's why the patch > > > > > > > seemed OK before. > > > > > > > > > > > > Thank you for letting me know. I'll fix it. > > > > > > > > > > Attached is the fixed patch. > > > > > > > > > > queryDesc->sourceText cannot be NULL after commit 1111b2668d8, > > > > > so now we pass an empty string "" for refresh_matview_datafill() instead NULL > > > > > when maintaining views incrementally. > > > > > > > > I am sorry, I forgot to include a fix for 8aba9322511. > > > > Attached is the fixed version. > > > > > > Attached is the rebased patch (for 6b8d29419d). > > > > I attached a rebased patch. > > The patch does not apply on Head anymore, could you rebase and post a > patch. I'm changing the status to "Waiting for Author". Ok. I'll update the patch in a few days. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Hi hackers, On Mon, 19 Jul 2021 09:24:30 +0900 Yugo NAGATA <nagata@sraoss.co.jp> wrote: > On Wed, 14 Jul 2021 21:22:37 +0530 > vignesh C <vignesh21@gmail.com> wrote: > > The patch does not apply on Head anymore, could you rebase and post a > > patch. I'm changing the status to "Waiting for Author". > > Ok. I'll update the patch in a few days. Attached is the latest patch set to add support for Incremental Materialized View Maintenance (IVM) The patches are rebased to the master and also revised with some code cleaning. IVM is a way to make materialized views up-to-date in which only incremental changes are computed and applied on views rather than recomputing the contents from scratch as REFRESH MATERIALIZED VIEW does. IVM can update materialized views more efficiently than recomputation when only small part of the view need updates. The patch set implements a feature so that materialized views could be updated automatically and immediately when a base table is modified. Currently, our IVM implementation supports views which could contain tuple duplicates whose definition includes: - inner and outer joins including self-join - DISTINCT - some built-in aggregate functions (count, sum, agv, min, and max) - a part of subqueries -- simple subqueries in FROM clause -- EXISTS subqueries in WHERE clause - CTEs We hope the IVM feature would be adopted into pg15. However, the size of patch set has grown too large through supporting above features. Therefore, I think it is better to consider only a part of these features for the first release. Especially, I would like propose the following features for pg15. - inner joins including self-join - DISTINCT and views with tuple duplicates - some built-in aggregate functions (count, sum, agv, min, and max) By omitting outer-join, sub-queries, and CTE features, the patch size becomes less than half. I hope this will make a bit easer to review the IVM patch set. Here is a list of separated patches. - 0001: Add a new syntax: CREATE INCREMENTAL MATERIALIZED VIEW - 0002: Add a new column relisivm to pg_class - 0003: Add new deptype option 'm' in pg_depend - 0004: Change trigger.c to allow to prolong life span of tupestores containing Transition Tables generated via AFTER trigger - 0005: Add IVM supprot for pg_dump - 0006: Add IVM support for psql - 0007: Add the basic IVM future: This supports inner joins, DISTINCT, and tuple duplicates. - 0008: Add aggregates (count, sum, avg, min, max) support for IVM - 0009: Add regression tests for IVM - 0010: Add documentation for IVM We could split the patch furthermore if this would make reviews much easer. For example, I think 0007 could be split into the more basic part and the part for handling tuple duplicates. Moreover, 0008 could be split into "min/max" and other aggregates because handling min/max is a bit more complicated than others. I also attached IVM_extra.tar.gz that contains patches for sub-quereis, outer-join, CTE support, just for your information. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp> -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
- IVM_extra.tar.gz
- v23-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patch
- v23-0002-Add-relisivm-column-to-pg_class-system-catalog.patch
- v23-0003-Add-new-deptype-option-m-in-pg_depend-system-cat.patch
- v23-0004-Allow-to-prolong-life-span-of-transition-tables-.patch
- v23-0005-Add-Incremental-View-Maintenance-support-to-pg_d.patch
- v23-0006-Add-Incremental-View-Maintenance-support-to-psql.patch
- v23-0007-Add-Incremental-View-Maintenance-support.patch
- v23-0008-Add-aggregates-support-in-IVM.patch
- v23-0009-Add-regression-tests-for-Incremental-View-Mainte.patch
- v23-0010-Add-documentations-about-Incremental-View-Mainte.patch
Hi hackers,
On Mon, 19 Jul 2021 09:24:30 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> On Wed, 14 Jul 2021 21:22:37 +0530
> vignesh C <vignesh21@gmail.com> wrote:
> > The patch does not apply on Head anymore, could you rebase and post a
> > patch. I'm changing the status to "Waiting for Author".
>
> Ok. I'll update the patch in a few days.
Attached is the latest patch set to add support for Incremental
Materialized View Maintenance (IVM)
The patches are rebased to the master and also revised with some
code cleaning.
IVM is a way to make materialized views up-to-date in which only
incremental changes are computed and applied on views rather than
recomputing the contents from scratch as REFRESH MATERIALIZED VIEW
does. IVM can update materialized views more efficiently
than recomputation when only small part of the view need updates.
The patch set implements a feature so that materialized views could be
updated automatically and immediately when a base table is modified.
Currently, our IVM implementation supports views which could contain
tuple duplicates whose definition includes:
- inner and outer joins including self-join
- DISTINCT
- some built-in aggregate functions (count, sum, agv, min, and max)
- a part of subqueries
-- simple subqueries in FROM clause
-- EXISTS subqueries in WHERE clause
- CTEs
We hope the IVM feature would be adopted into pg15. However, the size of
patch set has grown too large through supporting above features. Therefore,
I think it is better to consider only a part of these features for the first
release. Especially, I would like propose the following features for pg15.
- inner joins including self-join
- DISTINCT and views with tuple duplicates
- some built-in aggregate functions (count, sum, agv, min, and max)
By omitting outer-join, sub-queries, and CTE features, the patch size becomes
less than half. I hope this will make a bit easer to review the IVM patch set.
Here is a list of separated patches.
- 0001: Add a new syntax:
CREATE INCREMENTAL MATERIALIZED VIEW
- 0002: Add a new column relisivm to pg_class
- 0003: Add new deptype option 'm' in pg_depend
- 0004: Change trigger.c to allow to prolong life span of tupestores
containing Transition Tables generated via AFTER trigger
- 0005: Add IVM supprot for pg_dump
- 0006: Add IVM support for psql
- 0007: Add the basic IVM future:
This supports inner joins, DISTINCT, and tuple duplicates.
- 0008: Add aggregates (count, sum, avg, min, max) support for IVM
- 0009: Add regression tests for IVM
- 0010: Add documentation for IVM
We could split the patch furthermore if this would make reviews much easer.
For example, I think 0007 could be split into the more basic part and the part
for handling tuple duplicates. Moreover, 0008 could be split into "min/max"
and other aggregates because handling min/max is a bit more complicated than
others.
I also attached IVM_extra.tar.gz that contains patches for sub-quereis,
outer-join, CTE support, just for your information.
Regards,
Yugo Nagata
--
Yugo NAGATA <nagata@sraoss.co.jp>
--
Yugo NAGATA <nagata@sraoss.co.jp>
the target list because tuples to be updated in IMMV are identified
Hi Nagata-san, I am interested in this patch since it is good feature. I run some simple tests. I found the following problems. (1) Failed to "make world". I think there are extra "<lineitem>" in doc/src/sgml/ref/create_materialized_view.sgml (line 110 and 117) (2) In the case of partition, it seems that IVM does not work well. I run as follows. postgres=# create table parent (c int) partition by range (c); CREATE TABLE postgres=# create table child partition of parent for values from (1) to (100); CREATE TABLE postgres=# create incremental materialized view ivm_parent as select c from parent; NOTICE: could not create an index on materialized view "ivm_parent" automatically HINT: Create an index on the materialized view for efficient incremental maintenance. SELECT 0 postgres=# create incremental materialized view ivm_child as select c from child; NOTICE: could not create an index on materialized view "ivm_child" automatically HINT: Create an index on the materialized view for efficient incremental maintenance. SELECT 0 postgres=# insert into parent values (1); INSERT 0 1 postgres=# insert into child values (2); INSERT 0 1 postgres=# select * from parent; c --- 1 2 (2 rows) postgres=# select * from child; c --- 1 2 (2 rows) postgres=# select * from ivm_parent; c --- 1 (1 row) postgres=# select * from ivm_child; c --- 2 (1 row) I think ivm_parent and ivm_child should return 2 rows. (3) I think IVM does not support foreign table, but try to make IVM. postgres=# create incremental materialized view ivm_foreign as select c from foreign_table; NOTICE: could not create an index on materialized view "ivm_foreign" automatically HINT: Create an index on the materialized view for efficient incremental maintenance. ERROR: "foreign_table" is a foreign table DETAIL: Triggers on foreign tables cannot have transition tables. It finally failed to make IVM, but I think it should be checked more early. Regards, Ryohei Takahashi
Hello Zhihong Yu, On Mon, 2 Aug 2021 14:33:46 -0700 Zhihong Yu <zyu@yugabyte.com> wrote: > On Sun, Aug 1, 2021 at 11:30 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > Hi hackers, > > > > On Mon, 19 Jul 2021 09:24:30 +0900 > > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > > On Wed, 14 Jul 2021 21:22:37 +0530 > > > vignesh C <vignesh21@gmail.com> wrote: > > > > > > The patch does not apply on Head anymore, could you rebase and post a > > > > patch. I'm changing the status to "Waiting for Author". > > > > > > Ok. I'll update the patch in a few days. > > > > Attached is the latest patch set to add support for Incremental > > Materialized View Maintenance (IVM) > > > > The patches are rebased to the master and also revised with some > > code cleaning. > > > > IVM is a way to make materialized views up-to-date in which only > > incremental changes are computed and applied on views rather than > > recomputing the contents from scratch as REFRESH MATERIALIZED VIEW > > does. IVM can update materialized views more efficiently > > than recomputation when only small part of the view need updates. > > > > The patch set implements a feature so that materialized views could be > > updated automatically and immediately when a base table is modified. > > > > Currently, our IVM implementation supports views which could contain > > tuple duplicates whose definition includes: > > > > - inner and outer joins including self-join > > - DISTINCT > > - some built-in aggregate functions (count, sum, agv, min, and max) > > - a part of subqueries > > -- simple subqueries in FROM clause > > -- EXISTS subqueries in WHERE clause > > - CTEs > > > > We hope the IVM feature would be adopted into pg15. However, the size of > > patch set has grown too large through supporting above features. > > Therefore, > > I think it is better to consider only a part of these features for the > > first > > release. Especially, I would like propose the following features for pg15. > > > > - inner joins including self-join > > - DISTINCT and views with tuple duplicates > > - some built-in aggregate functions (count, sum, agv, min, and max) > > > > By omitting outer-join, sub-queries, and CTE features, the patch size > > becomes > > less than half. I hope this will make a bit easer to review the IVM patch > > set. > > > > Here is a list of separated patches. > > > > - 0001: Add a new syntax: > > CREATE INCREMENTAL MATERIALIZED VIEW > > - 0002: Add a new column relisivm to pg_class > > - 0003: Add new deptype option 'm' in pg_depend > > - 0004: Change trigger.c to allow to prolong life span of tupestores > > containing Transition Tables generated via AFTER trigger > > - 0005: Add IVM supprot for pg_dump > > - 0006: Add IVM support for psql > > - 0007: Add the basic IVM future: > > This supports inner joins, DISTINCT, and tuple duplicates. > > - 0008: Add aggregates (count, sum, avg, min, max) support for IVM > > - 0009: Add regression tests for IVM > > - 0010: Add documentation for IVM > > > > We could split the patch furthermore if this would make reviews much > > easer. > > For example, I think 0007 could be split into the more basic part and the > > part > > for handling tuple duplicates. Moreover, 0008 could be split into "min/max" > > and other aggregates because handling min/max is a bit more complicated > > than > > others. > > > > I also attached IVM_extra.tar.gz that contains patches for sub-quereis, > > outer-join, CTE support, just for your information. > > > > Regards, > > Yugo Nagata > > > > -- > > Yugo NAGATA <nagata@sraoss.co.jp> > > > > > > -- > > Yugo NAGATA <nagata@sraoss.co.jp> > > > Hi, > For v23-0008-Add-aggregates-support-in-IVM.patch : Thank you for looking into this! > As a restriction, expressions specified in GROUP BY must appear in > the target list because tuples to be updated in IMMV are identified > by using this group keys. > > IMMV -> IMVM (Incremental Materialized View Maintenance, as said above) > Or maybe it means 'incrementally maintainable materialized view'. It would > be better to use the same abbreviation. IMMV is correct in the commit message of this patch. Rather, IMVM used in v23-0003-Add-new-deptype-option-m-in-pg_depend-system-cat.patch should be corrected to IMMV. > this group keys -> this group key > > + errmsg("GROUP BY expression not appeared in select > list is not supported on incrementally maintainable materialized view"))); > > expression not appeared in select list -> expression not appearing in > select list > > + * For aggregate functions except to count > > except to count -> except count Thank you for pointing out them. I'll fix. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Hello Takahashi-san, On Tue, 3 Aug 2021 10:15:42 +0000 "r.takahashi_2@fujitsu.com" <r.takahashi_2@fujitsu.com> wrote: > Hi Nagata-san, > > > I am interested in this patch since it is good feature. > > I run some simple tests. > I found the following problems. Thank you for your interest for this patch! > (1) > Failed to "make world". > I think there are extra "<lineitem>" in doc/src/sgml/ref/create_materialized_view.sgml > (line 110 and 117) Oops. I'll fix it. > (2) > In the case of partition, it seems that IVM does not work well. > I run as follows. > > postgres=# create table parent (c int) partition by range (c); > CREATE TABLE > postgres=# create table child partition of parent for values from (1) to (100); > CREATE TABLE > postgres=# create incremental materialized view ivm_parent as select c from parent; > NOTICE: could not create an index on materialized view "ivm_parent" automatically > HINT: Create an index on the materialized view for efficient incremental maintenance. > SELECT 0 > postgres=# create incremental materialized view ivm_child as select c from child; > NOTICE: could not create an index on materialized view "ivm_child" automatically > HINT: Create an index on the materialized view for efficient incremental maintenance. > SELECT 0 > postgres=# insert into parent values (1); > INSERT 0 1 > postgres=# insert into child values (2); > INSERT 0 1 > postgres=# select * from parent; > c > --- > 1 > 2 > (2 rows) > > postgres=# select * from child; > c > --- > 1 > 2 > (2 rows) > > postgres=# select * from ivm_parent; > c > --- > 1 > (1 row) > > postgres=# select * from ivm_child; > c > --- > 2 > (1 row) > > > I think ivm_parent and ivm_child should return 2 rows. Good point! I'll investigate this more, but we may have to prohibit views on partitioned table and partitions. > (3) > I think IVM does not support foreign table, but try to make IVM. > > postgres=# create incremental materialized view ivm_foreign as select c from foreign_table; > NOTICE: could not create an index on materialized view "ivm_foreign" automatically > HINT: Create an index on the materialized view for efficient incremental maintenance. > ERROR: "foreign_table" is a foreign table > DETAIL: Triggers on foreign tables cannot have transition tables. > > It finally failed to make IVM, but I think it should be checked more early. You are right. We don't support foreign tables as long as we use triggers. I'll fix. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Hi Nagata-san, Thank you for your reply. > I'll investigate this more, but we may have to prohibit views on partitioned > table and partitions. I think this restriction is strict. This feature is useful when the base table is large and partitioning is also useful in such case. I have several additional comments on the patch. (1) The following features are added to transition table. - Prolong lifespan of transition table - If table has row security policies, set them to the transition table - Calculate pre-state of the table Are these features only for IVM? If there are other useful case, they should be separated from IVM patch and should be independent patch for transition table. (2) DEPENDENCY_IMMV (m) is added to deptype of pg_depend. What is the difference compared with existing deptype such as DEPENDENCY_INTERNAL (i)? (3) Converting from normal materialized view to IVM or from IVM to normal materialized view is not implemented yet. Is it difficult? I think create/drop triggers and __ivm_ columns can achieve this feature. Regards, Ryohei Takahashi
Hi hackers,
On Mon, 19 Jul 2021 09:24:30 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> On Wed, 14 Jul 2021 21:22:37 +0530
> vignesh C <vignesh21@gmail.com> wrote:
> > The patch does not apply on Head anymore, could you rebase and post a
> > patch. I'm changing the status to "Waiting for Author".
>
> Ok. I'll update the patch in a few days.
Attached is the latest patch set to add support for Incremental
Materialized View Maintenance (IVM)
The patches are rebased to the master and also revised with some
code cleaning.
IVM is a way to make materialized views up-to-date in which only
incremental changes are computed and applied on views rather than
recomputing the contents from scratch as REFRESH MATERIALIZED VIEW
does. IVM can update materialized views more efficiently
than recomputation when only small part of the view need updates.
The patch set implements a feature so that materialized views could be
updated automatically and immediately when a base table is modified.
Currently, our IVM implementation supports views which could contain
tuple duplicates whose definition includes:
- inner and outer joins including self-join
- DISTINCT
- some built-in aggregate functions (count, sum, agv, min, and max)
- a part of subqueries
-- simple subqueries in FROM clause
-- EXISTS subqueries in WHERE clause
- CTEs
We hope the IVM feature would be adopted into pg15. However, the size of
patch set has grown too large through supporting above features. Therefore,
I think it is better to consider only a part of these features for the first
release. Especially, I would like propose the following features for pg15.
- inner joins including self-join
- DISTINCT and views with tuple duplicates
- some built-in aggregate functions (count, sum, agv, min, and max)
By omitting outer-join, sub-queries, and CTE features, the patch size becomes
less than half. I hope this will make a bit easer to review the IVM patch set.
Here is a list of separated patches.
- 0001: Add a new syntax:
CREATE INCREMENTAL MATERIALIZED VIEW
- 0002: Add a new column relisivm to pg_class
- 0003: Add new deptype option 'm' in pg_depend
- 0004: Change trigger.c to allow to prolong life span of tupestores
containing Transition Tables generated via AFTER trigger
- 0005: Add IVM supprot for pg_dump
- 0006: Add IVM support for psql
- 0007: Add the basic IVM future:
This supports inner joins, DISTINCT, and tuple duplicates.
- 0008: Add aggregates (count, sum, avg, min, max) support for IVM
- 0009: Add regression tests for IVM
- 0010: Add documentation for IVM
We could split the patch furthermore if this would make reviews much easer.
For example, I think 0007 could be split into the more basic part and the part
for handling tuple duplicates. Moreover, 0008 could be split into "min/max"
and other aggregates because handling min/max is a bit more complicated than
others.
I also attached IVM_extra.tar.gz that contains patches for sub-quereis,
outer-join, CTE support, just for your information.
Regards,
Yugo Nagata
--
Yugo NAGATA <nagata@sraoss.co.jp>
--
Yugo NAGATA <nagata@sraoss.co.jp>
+ rewritten->groupClause = transformDistinctClause(NULL, &rewritten->targetList, rewritten->sortClause, false);
+
+ /* Add count(*) for counting distinct tuples in views */
+ if (rewritten->distinctClause)
On Sun, Aug 1, 2021 at 11:30 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote:Hi hackers,
On Mon, 19 Jul 2021 09:24:30 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> On Wed, 14 Jul 2021 21:22:37 +0530
> vignesh C <vignesh21@gmail.com> wrote:
> > The patch does not apply on Head anymore, could you rebase and post a
> > patch. I'm changing the status to "Waiting for Author".
>
> Ok. I'll update the patch in a few days.
Attached is the latest patch set to add support for Incremental
Materialized View Maintenance (IVM)
The patches are rebased to the master and also revised with some
code cleaning.
IVM is a way to make materialized views up-to-date in which only
incremental changes are computed and applied on views rather than
recomputing the contents from scratch as REFRESH MATERIALIZED VIEW
does. IVM can update materialized views more efficiently
than recomputation when only small part of the view need updates.
The patch set implements a feature so that materialized views could be
updated automatically and immediately when a base table is modified.
Currently, our IVM implementation supports views which could contain
tuple duplicates whose definition includes:
- inner and outer joins including self-join
- DISTINCT
- some built-in aggregate functions (count, sum, agv, min, and max)
- a part of subqueries
-- simple subqueries in FROM clause
-- EXISTS subqueries in WHERE clause
- CTEs
We hope the IVM feature would be adopted into pg15. However, the size of
patch set has grown too large through supporting above features. Therefore,
I think it is better to consider only a part of these features for the first
release. Especially, I would like propose the following features for pg15.
- inner joins including self-join
- DISTINCT and views with tuple duplicates
- some built-in aggregate functions (count, sum, agv, min, and max)
By omitting outer-join, sub-queries, and CTE features, the patch size becomes
less than half. I hope this will make a bit easer to review the IVM patch set.
Here is a list of separated patches.
- 0001: Add a new syntax:
CREATE INCREMENTAL MATERIALIZED VIEW
- 0002: Add a new column relisivm to pg_class
- 0003: Add new deptype option 'm' in pg_depend
- 0004: Change trigger.c to allow to prolong life span of tupestores
containing Transition Tables generated via AFTER trigger
- 0005: Add IVM supprot for pg_dump
- 0006: Add IVM support for psql
- 0007: Add the basic IVM future:
This supports inner joins, DISTINCT, and tuple duplicates.
- 0008: Add aggregates (count, sum, avg, min, max) support for IVM
- 0009: Add regression tests for IVM
- 0010: Add documentation for IVM
We could split the patch furthermore if this would make reviews much easer.
For example, I think 0007 could be split into the more basic part and the part
for handling tuple duplicates. Moreover, 0008 could be split into "min/max"
and other aggregates because handling min/max is a bit more complicated than
others.
I also attached IVM_extra.tar.gz that contains patches for sub-quereis,
outer-join, CTE support, just for your information.
Regards,
Yugo Nagata
--
Yugo NAGATA <nagata@sraoss.co.jp>
--
Yugo NAGATA <nagata@sraoss.co.jp>Hi,For v23-0007-Add-Incremental-View-Maintenance-support.patch :bq. In this implementation, AFTER triggers are used to collecting tuplestores'to collecting' -> to collectbq. are contained in a old transition table.'a old' -> an oldbq. updates of more than one base tablesone base tables -> one base tablebq. DISTINCT and tuple duplicates in views are supportedSince distinct and duplicate have opposite meanings, it would be better to rephrase the above sentence.bq. The value in__ivm_count__ is updatedI searched the patch for in__ivm_count__ - there was no (second) match. I think there should be a space between in and underscore.+static void CreateIvmTriggersOnBaseTables_recurse(Query *qry, Node *node, Oid matviewOid, Relids *relids, bool ex_lock);nit: long line. please wrap.+ if (rewritten->distinctClause)
+ rewritten->groupClause = transformDistinctClause(NULL, &rewritten->targetList, rewritten->sortClause, false);
+
+ /* Add count(*) for counting distinct tuples in views */
+ if (rewritten->distinctClause)It seems the body of the two if statements can be combined into one.More to follow for this patch.Cheers
+ {
+ if (!bms_is_member(rte->relid, *relids))
+
+ if (node == NULL)
+ return false;
Hi Nagata-san, I'm still reading the patch. I have additional comments. (1) In v23-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patch, ivm member is added to IntoClause struct. I think it is necessary to modify _copyIntoClause() and _equalIntoClause() functions. (2) By executing pg_dump with v23-0005-Add-Incremental-View-Maintenance-support-to-pg_d.patch, the constraint which is automatically created during "CREATE INCREMENTAL MATERIALIZED VIEW" is also dumped. This cause error during recovery as follows. ivm=# create table t (c1 int, c2 int); CREATE TABLE ivm=# create incremental materialized view ivm_t as select distinct c1 from t; NOTICE: created index "ivm_t_index" on materialized view "ivm_t" SELECT 0 Then I executed pg_dump. In the dump, the following SQLs appear. CREATE INCREMENTAL MATERIALIZED VIEW public.ivm_t AS SELECT DISTINCT t.c1 FROM public.t WITH NO DATA; ALTER TABLE ONLY public.ivm_t ADD CONSTRAINT ivm_t_index UNIQUE (c1); If I execute psql with the result of pg_dump, following error occurs. ERROR: ALTER action ADD CONSTRAINT cannot be performed on relation "ivm_t" DETAIL: This operation is not supported for materialized views. Regards, Ryohei Takahashi
Hello Zhihong Yu, Thank you for your suggestion! I am sorry for late replay. I'll fix them and submit the updated patch soon. On Sat, 7 Aug 2021 00:52:24 -0700 Zhihong Yu <zyu@yugabyte.com> wrote: > > Hi, > > For v23-0007-Add-Incremental-View-Maintenance-support.patch : > > > > bq. In this implementation, AFTER triggers are used to collecting > > tuplestores > > > > 'to collecting' -> to collect > > > > bq. are contained in a old transition table. > > > > 'a old' -> an old > > > > bq. updates of more than one base tables > > > > one base tables -> one base table I'll fix them. > > bq. DISTINCT and tuple duplicates in views are supported > > > > Since distinct and duplicate have opposite meanings, it would be better to > > rephrase the above sentence. I'll rewrite it to "Incrementally Maintainable Materialized Views (IMMV) can contain duplicated tuples. Also, DISTINCT clause is supported. " > > bq. The value in__ivm_count__ is updated > > > > I searched the patch for in__ivm_count__ - there was no (second) match. I > > think there should be a space between in and underscore. Yes, the space was missing. > > +static void CreateIvmTriggersOnBaseTables_recurse(Query *qry, Node *node, > > Oid matviewOid, Relids *relids, bool ex_lock); > > > > nit: long line. please wrap. OK. > > > > + if (rewritten->distinctClause) > > + rewritten->groupClause = transformDistinctClause(NULL, > > &rewritten->targetList, rewritten->sortClause, false); > > + > > + /* Add count(*) for counting distinct tuples in views */ > > + if (rewritten->distinctClause) > > > > It seems the body of the two if statements can be combined into one. Ok. > > + CreateIvmTriggersOnBaseTables_recurse(qry, (Node *)qry, matviewOid, > &relids, ex_lock); > > Looking at existing recursive functions, e.g. > > src/backend/executor/execPartition.c:find_matching_subplans_recurse(PartitionPruningData > *prunedata, > > the letters in the function name are all lower case. I think following the > convention would be nice. Ok. I'll rename this to CreateIvmTriggersOnBaseTablesRecurse since I found DeadLockCheckRecurse, transformExprRecurse, and so on. > > + if (rte->rtekind == RTE_RELATION) > + { > + if (!bms_is_member(rte->relid, *relids)) > > The conditions for the two if statements can be combined (saving some > indentation). Yes. I'll fix. > + check_stack_depth(); > + > + if (node == NULL) > + return false; > > It seems the node check can be placed ahead of the stack depth check. OK. > + * CreateindexOnIMMV > > CreateindexOnIMMV -> CreateIndexOnIMMV > > + (errmsg("could not create an index on materialized view > \"%s\" automatically", > > It would be nice to mention the reason is the lack of primary key. > > + /* create no index, just notice that an appropriate index is > necessary for efficient, IVM */ > > for efficient -> for efficiency. I'll fix them. Thanks. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Hello Takahashi-san, On Thu, 5 Aug 2021 08:53:47 +0000 "r.takahashi_2@fujitsu.com" <r.takahashi_2@fujitsu.com> wrote: > Hi Nagata-san, > > > Thank you for your reply. > > > I'll investigate this more, but we may have to prohibit views on partitioned > > table and partitions. > > I think this restriction is strict. > This feature is useful when the base table is large and partitioning is also useful in such case. One reason of this issue is the lack of triggers on partitioned tables or partitions that are not specified in the view definition. However, even if we create triggers recursively on the parents or children, we would still need more consideration. This is because we will have to convert the format of tuple of modified table to the format of the table specified in the view for cases that the parent and some children have different format. I think supporting partitioned tables can be left for the next release. > > I have several additional comments on the patch. > > > (1) > The following features are added to transition table. > - Prolong lifespan of transition table > - If table has row security policies, set them to the transition table > - Calculate pre-state of the table > > Are these features only for IVM? > If there are other useful case, they should be separated from IVM patch and > should be independent patch for transition table. Maybe. However, we don't have good idea about use cases other than IVM of them for now... > > (2) > DEPENDENCY_IMMV (m) is added to deptype of pg_depend. > What is the difference compared with existing deptype such as DEPENDENCY_INTERNAL (i)? DEPENDENCY_IMMV was added to clear that a certain trigger is related to IMMV. We dropped the IVM trigger and its dependencies from IMMV when REFRESH ... WITH NO DATA is executed. Without the new deptype, we may accidentally delete a dependency created with an intention other than the IVM trigger. > (3) > Converting from normal materialized view to IVM or from IVM to normal materialized view is not implemented yet. > Is it difficult? > > I think create/drop triggers and __ivm_ columns can achieve this feature. I think it is harder than you expected. When an IMMV is switched to a normal materialized view, we needs to drop hidden columns (__ivm_count__ etc.), and in the opposite case, we need to create them again. The former (IMMV->IVM) might be easer, but for the latter (IVM->IMMV) I wonder we would need to re-create IMMV. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Hello Takahashi-san, On Mon, 6 Sep 2021 10:06:37 +0000 "r.takahashi_2@fujitsu.com" <r.takahashi_2@fujitsu.com> wrote: > Hi Nagata-san, > > > I'm still reading the patch. > I have additional comments. Thank you for your comments! > > (1) > In v23-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patch, ivm member is added to IntoClause struct. > I think it is necessary to modify _copyIntoClause() and _equalIntoClause() functions. Ok. I'll fix _copyIntoClause() and _equalIntoClause() as well as _readIntoClause() and _outIntoClause(). > (2) > By executing pg_dump with v23-0005-Add-Incremental-View-Maintenance-support-to-pg_d.patch, > the constraint which is automatically created during "CREATE INCREMENTAL MATERIALIZED VIEW" is also dumped. > This cause error during recovery as follows. > > ivm=# create table t (c1 int, c2 int); > CREATE TABLE > ivm=# create incremental materialized view ivm_t as select distinct c1 from t; > NOTICE: created index "ivm_t_index" on materialized view "ivm_t" > SELECT 0 > > Then I executed pg_dump. > > In the dump, the following SQLs appear. > > CREATE INCREMENTAL MATERIALIZED VIEW public.ivm_t AS > SELECT DISTINCT t.c1 > FROM public.t > WITH NO DATA; > > ALTER TABLE ONLY public.ivm_t > ADD CONSTRAINT ivm_t_index UNIQUE (c1); > > If I execute psql with the result of pg_dump, following error occurs. > > ERROR: ALTER action ADD CONSTRAINT cannot be performed on relation "ivm_t" > DETAIL: This operation is not supported for materialized views. Good catch! It was my mistake creating unique constraints on IMMV in spite of we cannot defined them via SQL. I'll fix it to use unique indexes instead of constraints. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Hi hackers, I attached the updated patch including fixes reported by Zhihong Yu and Ryohei Takahashi. Regards, Yugo Nagata On Wed, 22 Sep 2021 19:12:27 +0900 Yugo NAGATA <nagata@sraoss.co.jp> wrote: > Hello Takahashi-san, > > On Mon, 6 Sep 2021 10:06:37 +0000 > "r.takahashi_2@fujitsu.com" <r.takahashi_2@fujitsu.com> wrote: > > > Hi Nagata-san, > > > > > > I'm still reading the patch. > > I have additional comments. > > Thank you for your comments! > > > > > (1) > > In v23-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patch, ivm member is added to IntoClause struct. > > I think it is necessary to modify _copyIntoClause() and _equalIntoClause() functions. > > Ok. I'll fix _copyIntoClause() and _equalIntoClause() as well as _readIntoClause() and _outIntoClause(). > > > (2) > > By executing pg_dump with v23-0005-Add-Incremental-View-Maintenance-support-to-pg_d.patch, > > the constraint which is automatically created during "CREATE INCREMENTAL MATERIALIZED VIEW" is also dumped. > > This cause error during recovery as follows. > > > > ivm=# create table t (c1 int, c2 int); > > CREATE TABLE > > ivm=# create incremental materialized view ivm_t as select distinct c1 from t; > > NOTICE: created index "ivm_t_index" on materialized view "ivm_t" > > SELECT 0 > > > > Then I executed pg_dump. > > > > In the dump, the following SQLs appear. > > > > CREATE INCREMENTAL MATERIALIZED VIEW public.ivm_t AS > > SELECT DISTINCT t.c1 > > FROM public.t > > WITH NO DATA; > > > > ALTER TABLE ONLY public.ivm_t > > ADD CONSTRAINT ivm_t_index UNIQUE (c1); > > > > If I execute psql with the result of pg_dump, following error occurs. > > > > ERROR: ALTER action ADD CONSTRAINT cannot be performed on relation "ivm_t" > > DETAIL: This operation is not supported for materialized views. > > Good catch! It was my mistake creating unique constraints on IMMV in spite of > we cannot defined them via SQL. I'll fix it to use unique indexes instead of > constraints. > > Regards, > Yugo Nagata > > -- > Yugo NAGATA <nagata@sraoss.co.jp> > > -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
On Wed, 22 Sep 2021 19:17:12 +0900 Yugo NAGATA <nagata@sraoss.co.jp> wrote: > Hi hackers, > > I attached the updated patch including fixes reported by > Zhihong Yu and Ryohei Takahashi. Cfbot seems to fail to open the tar file, so I attached patch files instead of tar ball. Regards, Yugo Nagata > On Wed, 22 Sep 2021 19:12:27 +0900 > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > Hello Takahashi-san, > > > > On Mon, 6 Sep 2021 10:06:37 +0000 > > "r.takahashi_2@fujitsu.com" <r.takahashi_2@fujitsu.com> wrote: > > > > > Hi Nagata-san, > > > > > > > > > I'm still reading the patch. > > > I have additional comments. > > > > Thank you for your comments! > > > > > > > > (1) > > > In v23-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patch, ivm member is added to IntoClause struct. > > > I think it is necessary to modify _copyIntoClause() and _equalIntoClause() functions. > > > > Ok. I'll fix _copyIntoClause() and _equalIntoClause() as well as _readIntoClause() and _outIntoClause(). > > > > > (2) > > > By executing pg_dump with v23-0005-Add-Incremental-View-Maintenance-support-to-pg_d.patch, > > > the constraint which is automatically created during "CREATE INCREMENTAL MATERIALIZED VIEW" is also dumped. > > > This cause error during recovery as follows. > > > > > > ivm=# create table t (c1 int, c2 int); > > > CREATE TABLE > > > ivm=# create incremental materialized view ivm_t as select distinct c1 from t; > > > NOTICE: created index "ivm_t_index" on materialized view "ivm_t" > > > SELECT 0 > > > > > > Then I executed pg_dump. > > > > > > In the dump, the following SQLs appear. > > > > > > CREATE INCREMENTAL MATERIALIZED VIEW public.ivm_t AS > > > SELECT DISTINCT t.c1 > > > FROM public.t > > > WITH NO DATA; > > > > > > ALTER TABLE ONLY public.ivm_t > > > ADD CONSTRAINT ivm_t_index UNIQUE (c1); > > > > > > If I execute psql with the result of pg_dump, following error occurs. > > > > > > ERROR: ALTER action ADD CONSTRAINT cannot be performed on relation "ivm_t" > > > DETAIL: This operation is not supported for materialized views. > > > > Good catch! It was my mistake creating unique constraints on IMMV in spite of > > we cannot defined them via SQL. I'll fix it to use unique indexes instead of > > constraints. > > > > Regards, > > Yugo Nagata > > > > -- > > Yugo NAGATA <nagata@sraoss.co.jp> > > > > > > > -- > Yugo NAGATA <nagata@sraoss.co.jp> -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
- v24-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patch
- v24-0002-Add-relisivm-column-to-pg_class-system-catalog.patch
- v24-0003-Add-new-deptype-option-m-in-pg_depend-system-cat.patch
- v24-0004-Allow-to-prolong-life-span-of-transition-tables-.patch
- v24-0005-Add-Incremental-View-Maintenance-support-to-pg_d.patch
- v24-0006-Add-Incremental-View-Maintenance-support-to-psql.patch
- v24-0007-Add-Incremental-View-Maintenance-support.patch
- v24-0008-Add-aggregates-support-in-IVM.patch
- v24-0009-Add-regression-tests-for-Incremental-View-Mainte.patch
- v24-0010-Add-documentations-about-Incremental-View-Mainte.patch
Hello Takahashi-san, On Wed, 22 Sep 2021 18:53:43 +0900 Yugo NAGATA <nagata@sraoss.co.jp> wrote: > Hello Takahashi-san, > > On Thu, 5 Aug 2021 08:53:47 +0000 > "r.takahashi_2@fujitsu.com" <r.takahashi_2@fujitsu.com> wrote: > > > Hi Nagata-san, > > > > > > Thank you for your reply. > > > > > I'll investigate this more, but we may have to prohibit views on partitioned > > > table and partitions. > > > > I think this restriction is strict. > > This feature is useful when the base table is large and partitioning is also useful in such case. > > One reason of this issue is the lack of triggers on partitioned tables or partitions that > are not specified in the view definition. > > However, even if we create triggers recursively on the parents or children, we would still > need more consideration. This is because we will have to convert the format of tuple of > modified table to the format of the table specified in the view for cases that the parent > and some children have different format. > > I think supporting partitioned tables can be left for the next release. > > > > > I have several additional comments on the patch. > > > > > > (1) > > The following features are added to transition table. > > - Prolong lifespan of transition table > > - If table has row security policies, set them to the transition table > > - Calculate pre-state of the table > > > > Are these features only for IVM? > > If there are other useful case, they should be separated from IVM patch and > > should be independent patch for transition table. > > Maybe. However, we don't have good idea about use cases other than IVM of > them for now... > > > > > (2) > > DEPENDENCY_IMMV (m) is added to deptype of pg_depend. > > What is the difference compared with existing deptype such as DEPENDENCY_INTERNAL (i)? > > DEPENDENCY_IMMV was added to clear that a certain trigger is related to IMMV. > We dropped the IVM trigger and its dependencies from IMMV when REFRESH ... WITH NO DATA > is executed. Without the new deptype, we may accidentally delete a dependency created > with an intention other than the IVM trigger. > > > (3) > > Converting from normal materialized view to IVM or from IVM to normal materialized view is not implemented yet. > > Is it difficult? > > > > I think create/drop triggers and __ivm_ columns can achieve this feature. > > I think it is harder than you expected. When an IMMV is switched to a normal > materialized view, we needs to drop hidden columns (__ivm_count__ etc.), and in > the opposite case, we need to create them again. The former (IMMV->IVM) might be > easer, but for the latter (IVM->IMMV) I wonder we would need to re-create IMMV. I am sorry but I found a mistake in the above description. "IMMV->IVM" and "IVM->IMMV" were wrong. I've should use "IMMV->MV" and "MV->IMMV" where MV means normal materialized view.w. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Hi hackers, I attached the rebased patch set. Regards, Yugo Nagata On Thu, 23 Sep 2021 04:57:30 +0900 Yugo NAGATA <nagata@sraoss.co.jp> wrote: > On Wed, 22 Sep 2021 19:17:12 +0900 > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > Hi hackers, > > > > I attached the updated patch including fixes reported by > > Zhihong Yu and Ryohei Takahashi. > > Cfbot seems to fail to open the tar file, so I attached > patch files instead of tar ball. > > Regards, > Yugo Nagata > > > > On Wed, 22 Sep 2021 19:12:27 +0900 > > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > > Hello Takahashi-san, > > > > > > On Mon, 6 Sep 2021 10:06:37 +0000 > > > "r.takahashi_2@fujitsu.com" <r.takahashi_2@fujitsu.com> wrote: > > > > > > > Hi Nagata-san, > > > > > > > > > > > > I'm still reading the patch. > > > > I have additional comments. > > > > > > Thank you for your comments! > > > > > > > > > > > (1) > > > > In v23-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patch, ivm member is added to IntoClause struct. > > > > I think it is necessary to modify _copyIntoClause() and _equalIntoClause() functions. > > > > > > Ok. I'll fix _copyIntoClause() and _equalIntoClause() as well as _readIntoClause() and _outIntoClause(). > > > > > > > (2) > > > > By executing pg_dump with v23-0005-Add-Incremental-View-Maintenance-support-to-pg_d.patch, > > > > the constraint which is automatically created during "CREATE INCREMENTAL MATERIALIZED VIEW" is also dumped. > > > > This cause error during recovery as follows. > > > > > > > > ivm=# create table t (c1 int, c2 int); > > > > CREATE TABLE > > > > ivm=# create incremental materialized view ivm_t as select distinct c1 from t; > > > > NOTICE: created index "ivm_t_index" on materialized view "ivm_t" > > > > SELECT 0 > > > > > > > > Then I executed pg_dump. > > > > > > > > In the dump, the following SQLs appear. > > > > > > > > CREATE INCREMENTAL MATERIALIZED VIEW public.ivm_t AS > > > > SELECT DISTINCT t.c1 > > > > FROM public.t > > > > WITH NO DATA; > > > > > > > > ALTER TABLE ONLY public.ivm_t > > > > ADD CONSTRAINT ivm_t_index UNIQUE (c1); > > > > > > > > If I execute psql with the result of pg_dump, following error occurs. > > > > > > > > ERROR: ALTER action ADD CONSTRAINT cannot be performed on relation "ivm_t" > > > > DETAIL: This operation is not supported for materialized views. > > > > > > Good catch! It was my mistake creating unique constraints on IMMV in spite of > > > we cannot defined them via SQL. I'll fix it to use unique indexes instead of > > > constraints. > > > > > > Regards, > > > Yugo Nagata > > > > > > -- > > > Yugo NAGATA <nagata@sraoss.co.jp> > > > > > > > > > > > > -- > > Yugo NAGATA <nagata@sraoss.co.jp> > > > -- > Yugo NAGATA <nagata@sraoss.co.jp> -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
- v24-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patch
- v24-0002-Add-relisivm-column-to-pg_class-system-catalog.patch
- v24-0003-Add-new-deptype-option-m-in-pg_depend-system-cat.patch
- v24-0004-Allow-to-prolong-life-span-of-transition-tables-.patch
- v24-0005-Add-Incremental-View-Maintenance-support-to-pg_d.patch
- v24-0006-Add-Incremental-View-Maintenance-support-to-psql.patch
- v24-0007-Add-Incremental-View-Maintenance-support.patch
- v24-0008-Add-aggregates-support-in-IVM.patch
- v24-0009-Add-regression-tests-for-Incremental-View-Mainte.patch
- v24-0010-Add-documentations-about-Incremental-View-Mainte.patch
Hi Nagata-san, Sorry for late reply. > However, even if we create triggers recursively on the parents or children, we would still > need more consideration. This is because we will have to convert the format of tuple of > modified table to the format of the table specified in the view for cases that the parent > and some children have different format. > > I think supporting partitioned tables can be left for the next release. OK. I understand. In the v24-patch, creating IVM on partions or partition table is prohibited. It is OK but it should be documented. Perhaps, the following statement describe this. If so, I think the definition of "simple base table" is ambiguous for some users. + IMMVs must be based on simple base tables. It's not supported to + create them on top of views or materialized views. > DEPENDENCY_IMMV was added to clear that a certain trigger is related to IMMV. > We dropped the IVM trigger and its dependencies from IMMV when REFRESH ... WITH NO DATA > is executed. Without the new deptype, we may accidentally delete a dependency created > with an intention other than the IVM trigger. OK. I understand. > I think it is harder than you expected. When an IMMV is switched to a normal > materialized view, we needs to drop hidden columns (__ivm_count__ etc.), and in > the opposite case, we need to create them again. The former (IMMV->IVM) might be > easer, but for the latter (IVM->IMMV) I wonder we would need to re-create > IMMV. OK. I understand. Regards, Ryohei Takahashi
Hi Nagata-san, > Ok. I'll fix _copyIntoClause() and _equalIntoClause() as well as _readIntoClause() > and _outIntoClause(). OK. > > ivm=# create table t (c1 int, c2 int); > > CREATE TABLE > > ivm=# create incremental materialized view ivm_t as select distinct c1 from t; > > NOTICE: created index "ivm_t_index" on materialized view "ivm_t" > > SELECT 0 > > > > Then I executed pg_dump. > > > > In the dump, the following SQLs appear. > > > > CREATE INCREMENTAL MATERIALIZED VIEW public.ivm_t AS > > SELECT DISTINCT t.c1 > > FROM public.t > > WITH NO DATA; > > > > ALTER TABLE ONLY public.ivm_t > > ADD CONSTRAINT ivm_t_index UNIQUE (c1); > > > > If I execute psql with the result of pg_dump, following error occurs. > > > > ERROR: ALTER action ADD CONSTRAINT cannot be performed on relation > "ivm_t" > > DETAIL: This operation is not supported for materialized views. > > Good catch! It was my mistake creating unique constraints on IMMV in spite of > we cannot defined them via SQL. I'll fix it to use unique indexes instead of > constraints. I checked the same procedure on v24 patch. But following error occurs instead of the original error. ERROR: relation "ivm_t_index" already exists Regards, Ryohei Takahashi
Hello Takahashi-san, On Wed, 24 Nov 2021 04:27:13 +0000 "r.takahashi_2@fujitsu.com" <r.takahashi_2@fujitsu.com> wrote: > Hi Nagata-san, > > > Sorry for late reply. > > > > However, even if we create triggers recursively on the parents or children, we would still > > need more consideration. This is because we will have to convert the format of tuple of > > modified table to the format of the table specified in the view for cases that the parent > > and some children have different format. > > > > I think supporting partitioned tables can be left for the next release. > > OK. I understand. > In the v24-patch, creating IVM on partions or partition table is prohibited. > It is OK but it should be documented. > > Perhaps, the following statement describe this. > If so, I think the definition of "simple base table" is ambiguous for some users. > > + IMMVs must be based on simple base tables. It's not supported to > + create them on top of views or materialized views. Oh, I forgot to fix the documentation. I'll fix it. Ragards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
On Wed, 24 Nov 2021 04:31:25 +0000 "r.takahashi_2@fujitsu.com" <r.takahashi_2@fujitsu.com> wrote: > > > ivm=# create table t (c1 int, c2 int); > > > CREATE TABLE > > > ivm=# create incremental materialized view ivm_t as select distinct c1 from t; > > > NOTICE: created index "ivm_t_index" on materialized view "ivm_t" > > > SELECT 0 > > > > > > Then I executed pg_dump. > > > > > > In the dump, the following SQLs appear. > > > > > > CREATE INCREMENTAL MATERIALIZED VIEW public.ivm_t AS > > > SELECT DISTINCT t.c1 > > > FROM public.t > > > WITH NO DATA; > > > > > > ALTER TABLE ONLY public.ivm_t > > > ADD CONSTRAINT ivm_t_index UNIQUE (c1); > > > > > > If I execute psql with the result of pg_dump, following error occurs. > > > > > > ERROR: ALTER action ADD CONSTRAINT cannot be performed on relation > > "ivm_t" > > > DETAIL: This operation is not supported for materialized views. > > > > Good catch! It was my mistake creating unique constraints on IMMV in spite of > > we cannot defined them via SQL. I'll fix it to use unique indexes instead of > > constraints. > > I checked the same procedure on v24 patch. > But following error occurs instead of the original error. > > ERROR: relation "ivm_t_index" already exists Thank you for pointing out it! Hmmm, an index is created when IMMV is defined, so CREAE INDEX called after this would fail... Maybe, we should not create any index automatically if IMMV is created WITH NO DATA. I'll fix it after some investigation. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Hi hackers, This is a response to a comment in "Commitfest 2021-11 Patch Triage - Part 1" [1]. > 2138: Incremental Materialized View Maintenance > =============================================== > There seems to be concensus on the thread that this is a feature that we want, > and after initial design discussions there seems to be no disagreements with > the approach taken. The patch was marked ready for committer almost a year > ago, but have since been needs review (which seems correct). The size of the > patchset and the length of the thread make it hard to gauge just far away it > is, maybe the author or a review can summarize the current state and outline > what is left for it to be committable. [1] https://www.postgresql.org/message-id/6EDAAF93-1663-41D0-9148-76739104943E%40yesql.se I'll describe recent discussions and current status of this thread. * Recent Discussions and Current status 1. Previously, we proposed a patchset that supports outer-joins, some sub-queries and CTEs. However, aiming to reduce the size of the patchset, I proposed to omit these features from the first version of the patch in my post at 2021-08-02 [2]. Currently, we are proposing Incremental View Maintenance feature for PostgreSQL 15 that supports following queries in the view definition query. - inner joins including self-join - DISTINCT and views with tuple duplicates - some built-in aggregate functions (count, sum, agv, min, and max) Is it OK? Although there has been no opposite opinion, we want to confirm it. [2] https://www.postgresql.org/message-id/20210802152834.ecbaba6e17d1957547c3a55d%40sraoss.co.jp 2. Recently, There was a suggestion that we should support partitioned tables from Ryohei Takahashi, but I decided to not support it in the first release of IVM. Takahshi-san agreed with it, and the documentation will be fixed soon [3]. [3] https://www.postgresql.org/message-id/20211125154717.777e9d35ddde5f2e0d5d8355%40sraoss.co.jp 3. Takahashi-san pointed out that restoring pg_dump results causes an error. I am fixing it now.[4] [4] https://www.postgresql.org/message-id/20211125163710.2f32ae3d4be5d5f9ade020b6%40sraoss.co.jp The remaining is the summary of our proposal of IVM feature, its design, and past discussions. --------------------------------------------------------------------------------------- * Features Incremental View Maintenance (IVM) is a way to make materialized views up-to-date by computing only incremental changes and applying them on views. IVM is more efficient than REFRESH MATERIALIZED VIEW when only small parts of the view are changed. This patchset provides a feature that allows materialized views to be updated automatically and incrementally just after a underlying table is modified. You can create an incementally maintainable materialized view (IMMV) by using CREATE INCREMENTAL MATERIALIZED VIEW command. The followings are supported in view definition queries: - SELECT ... FROM ... WHERE ..., joins (inner joins, self-joins) - some built-in aggregate functions (count, sum, avg, min, max) - GROUP BY clause - DISTINCT clause Views can contain multiple tuples with the same content (duplicate tuples). The following are not supported in a view definition: - Outer joins - Aggregates otehr than above, window functions, HAVING - Sub-queries, CTEs - Set operations (UNION, INTERSECT, EXCEPT) - DISTINCT ON, ORDER BY, LIMIT, OFFSET Also, a view definition query cannot contain other views, materialized views, foreign tables, partitioned tables, partitions, VALUES, non-immutable functions, system columns, or expressions that contains aggregates. --------------------------------------------------------------------------------------- * Design An IMMV is maintained using statement-level AFTER triggers. When an IMMV is created, triggers are automatically created on all base tables contained in the view definition query. When a table is modified, the change that occurred in the table are extracted as transition tables in the AFTER triggers. Then, changes that will occur in the view are calculated by a rewritten view dequery in which the modified table is replaced with the transition table. For example, if the view is defined as "SELECT * FROM R, S", and tuples inserted into R are stored in a transiton table dR, the tuples that will be inserted into the view are calculated as the result of "SELECT * FROM dR, S". ** Multiple Tables Modification Multiple tables can be modified in a statement when using triggers, foreign key constraint, or modifying CTEs. When multiple tables are modified, we need the state of tables before the modification. For example, when some tuples, dR and dS, are inserted into R and S respectively, the tuples that will be inserted into the view are calculated by the following two queries: "SELECT * FROM dR, S_pre" "SELECT * FROM R, dS" where S_pre is the table before the modification, R is the current state of table, that is, after the modification. This pre-update states of table is calculated by filtering inserted tuples using cmin/xmin system columns, and appending deleted tuples which are contained in the old transition table. This is implemented in get_prestate_rte(). Transition tables for each modification are collected in each AFTER trigger function call. Then, the view maintenance is performed in the last call of the trigger. In the original PostgreSQL, tuplestores of transition tables are freed at the end of each nested query. However, their lifespan needs to be prolonged to the end of the out-most query in order to maintain the view in the last AFTER trigger. For this purpose, SetTransitionTablePreserved is added in trigger.c. ** Duplicate Tulpes When calculating changes that will occur in the view (= delta tables), multiplicity of tuples are calculated by using count(*). When deleting tuples from the view, tuples to be deleted are identified by joining the delta table with the view, and the tuples are deleted as many as specified multiplicity by numbered using row_number() function. This is implemented in apply_old_delta(). When inserting tuples into the view, tuples are duplicated to the specified multiplicity using generate_series() function. This is implemented in apply_new_delta(). ** DISTINCT clause When DISTINCT is used, the view has a hidden column __ivm_count__ that stores multiplicity for tuples. When tuples are deleted from or inserted into the view, the values of __ivm_count__ column is decreased or increased as many as specified multiplicity. Eventually, when the values becomes zero, the corresponding tuple is deleted from the view. This is implemented in apply_old_delta_with_count() and apply_new_delta_with_count(). ** Aggregates Built-in count sum, avg, min, and max are supported. Whether a given aggregate function can be used or not is checked by using its OID in check_aggregate_supports_ivm(). When creating a materialized view containing aggregates, in addition to __ivm_count__, more than one hidden columns for each aggregate are added to the target list. For example, columns for storing sum(x), count(x) are added if we have avg(x). When the view is maintained, aggregated values are updated using these hidden columns, also hidden columns are updated at the same time. The maintenance of aggregated view is performed in apply_old_delta_with_count() and apply_new_delta_with_count(). The SET clauses for updating columns are generated by append_set_clause_*(). If the view has min(x) or max(x) and the minimum or maximal value is deleted from a table, we need to update the value to the new min/max recalculated from the tables rather than incremental computation. This is performed in recalc_and_set_values(). --------------------------------------------------------------------------------------- * Discussion ** Aggregate support There were a few suggestions that general aggregate functions should be supported [5][6], which may be possible by extending pg_aggregate catalog. However, we decided to left supporting general aggregates to the future work [7] because it would need substantial works and make the patch more complex and bigger. There has been no opposite opinion on this. [5] https://www.postgresql.org/message-id/20191128140333.GA25947%40alvherre.pgsql [6] https://www.postgresql.org/message-id/CAM-w4HOvDrL4ou6m%3D592zUiKGVzTcOpNj-d_cJqzL00fdsS5kg%40mail.gmail.com [7] https://www.postgresql.org/message-id/20201016193034.9a4c44c79fc1eca7babe093e%40sraoss.co.jp ** Hidden columns Columns starting with "__ivm_" are hidden columns that doesn't appear when a view is accessed by "SELECT * FROM ....". For this aim, parse_relation.c is fixed. There was a proposal to enable hidden columns by adding a new flag to pg_attribute [8], but this thread is no longer active, so we decided to check the hidden column by its name [9]. [8] https://www.postgresql.org/message-id/flat/CAEepm%3D3ZHh%3Dp0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA%40mail.gmail.com [9] https://www.postgresql.org/message-id/20201016193034.9a4c44c79fc1eca7babe093e%40sraoss.co.jp ** Concurrent Transactions When the view definition has more than one table, we acquire an exclusive lock before the view maintenance in order to avoid inconsistent results. This behavior was explained in [10]. The lock was improved to use weaker lock when the view has only one table based on a suggestion from Konstantin Knizhnik [11]. [10] https://www.postgresql.org/message-id/20200909092752.c91758a1bec3479668e82643%40sraoss.co.jp [11] https://www.postgresql.org/message-id/5663f5f0-48af-686c-bf3c-62d279567e2a%40postgrespro.ru ** Automatic Index Creation When a view is created, a unique index is automatically created if possible, that is, if the view definition query has a GROUP BY or DISTINCT, or if the view contains all primary key attributes of its base tables in the target list. It is necessary for efficient view maintenance. This feature is based on a suggestion from Konstantin Knizhnik [12]. [12] https://www.postgresql.org/message-id/89729da8-9042-7ea0-95af-e415df6da14d%40postgrespro.ru ** Others There are some other changes in core for IVM implementation. There has been no opposite opinion on any ever. - syntax The command to create an incrementally maintainable materialized view (IMMV) is "CREATE INCREMENTAL MATERIALIZED VIEW". The new keyword "INCREMENTAL" is added. - pg_class A new attribue "relisivm" is added to pg_class to indicate that the relation is an IMMV. - deptype DEPENDENCY_IMMV(m) was added to pg_depend as a new deptype. This is necessary to clear that a certain trigger is related to IMMV, especially when We dropped IVM triggers from the view when REFRESH ... WITH NO DATA is executed [13]. [13] https://www.postgresql.org/message-id/20210922185343.548883e81b8baef14a0193c5%40sraoss.co.jp --------------------------------------------------------------------------------------- Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Hi, On Thu, Nov 25, 2021 at 04:37:10PM +0900, Yugo NAGATA wrote: > On Wed, 24 Nov 2021 04:31:25 +0000 > "r.takahashi_2@fujitsu.com" <r.takahashi_2@fujitsu.com> wrote: > > > > > I checked the same procedure on v24 patch. > > But following error occurs instead of the original error. > > > > ERROR: relation "ivm_t_index" already exists > > Thank you for pointing out it! > > Hmmm, an index is created when IMMV is defined, so CREAE INDEX called > after this would fail... Maybe, we should not create any index automatically > if IMMV is created WITH NO DATA. > > I'll fix it after some investigation. Are you still investigating on that problem? Also, the patchset doesn't apply anymore: http://cfbot.cputube.org/patch_36_2138.log === Applying patches on top of PostgreSQL commit ID a18b6d2dc288dfa6e7905ede1d4462edd6a8af47 === [...] === applying patch ./v24-0005-Add-Incremental-View-Maintenance-support-to-pg_d.patch patching file src/bin/pg_dump/pg_dump.c Hunk #1 FAILED at 6393. Hunk #2 FAILED at 6596. Hunk #3 FAILED at 6719. Hunk #4 FAILED at 6796. Hunk #5 succeeded at 14953 (offset -915 lines). 4 out of 5 hunks FAILED -- saving rejects to file src/bin/pg_dump/pg_dump.c.rej There isn't any answer to your following email summarizing the feature yet, so I'm not sure what should be the status of this patch, as there's no ideal category for that. For now I'll change the patch to Waiting on Author on the cf app, feel free to switch it back to Needs Review if you think it's more suitable, at least for the design discussion need.
Hi, On Thu, 13 Jan 2022 18:23:42 +0800 Julien Rouhaud <rjuju123@gmail.com> wrote: > Hi, > > On Thu, Nov 25, 2021 at 04:37:10PM +0900, Yugo NAGATA wrote: > > On Wed, 24 Nov 2021 04:31:25 +0000 > > "r.takahashi_2@fujitsu.com" <r.takahashi_2@fujitsu.com> wrote: > > > > > > > > I checked the same procedure on v24 patch. > > > But following error occurs instead of the original error. > > > > > > ERROR: relation "ivm_t_index" already exists > > > > Thank you for pointing out it! > > > > Hmmm, an index is created when IMMV is defined, so CREAE INDEX called > > after this would fail... Maybe, we should not create any index automatically > > if IMMV is created WITH NO DATA. > > > > I'll fix it after some investigation. > > Are you still investigating on that problem? Also, the patchset doesn't apply > anymore: I attached the updated and rebased patch set. I fixed to not create a unique index when an IMMV is created WITH NO DATA. Instead, the index is created by REFRESH WITH DATA only when the same one is not created yet. Also, I fixed the documentation to describe that foreign tables and partitioned tables are not supported according with Takahashi-san's suggestion. > There isn't any answer to your following email summarizing the feature yet, so > I'm not sure what should be the status of this patch, as there's no ideal > category for that. For now I'll change the patch to Waiting on Author on the > cf app, feel free to switch it back to Needs Review if you think it's more > suitable, at least for the design discussion need. I changed the status to Needs Review. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Hi,
On Thu, 13 Jan 2022 18:23:42 +0800
Julien Rouhaud <rjuju123@gmail.com> wrote:
> Hi,
>
> On Thu, Nov 25, 2021 at 04:37:10PM +0900, Yugo NAGATA wrote:
> > On Wed, 24 Nov 2021 04:31:25 +0000
> > "r.takahashi_2@fujitsu.com" <r.takahashi_2@fujitsu.com> wrote:
> >
> > >
> > > I checked the same procedure on v24 patch.
> > > But following error occurs instead of the original error.
> > >
> > > ERROR: relation "ivm_t_index" already exists
> >
> > Thank you for pointing out it!
> >
> > Hmmm, an index is created when IMMV is defined, so CREAE INDEX called
> > after this would fail... Maybe, we should not create any index automatically
> > if IMMV is created WITH NO DATA.
> >
> > I'll fix it after some investigation.
>
> Are you still investigating on that problem? Also, the patchset doesn't apply
> anymore:
I attached the updated and rebased patch set.
I fixed to not create a unique index when an IMMV is created WITH NO DATA.
Instead, the index is created by REFRESH WITH DATA only when the same one
is not created yet.
Also, I fixed the documentation to describe that foreign tables and partitioned
tables are not supported according with Takahashi-san's suggestion.
> There isn't any answer to your following email summarizing the feature yet, so
> I'm not sure what should be the status of this patch, as there's no ideal
> category for that. For now I'll change the patch to Waiting on Author on the
> cf app, feel free to switch it back to Needs Review if you think it's more
> suitable, at least for the design discussion need.
I changed the status to Needs Review.
On Thu, 3 Feb 2022 08:48:00 -0800 Zhihong Yu <zyu@yugabyte.com> wrote: > On Thu, Feb 3, 2022 at 8:28 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > Hi, > > > > On Thu, 13 Jan 2022 18:23:42 +0800 > > Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > Hi, > > > > > > On Thu, Nov 25, 2021 at 04:37:10PM +0900, Yugo NAGATA wrote: > > > > On Wed, 24 Nov 2021 04:31:25 +0000 > > > > "r.takahashi_2@fujitsu.com" <r.takahashi_2@fujitsu.com> wrote: > > > > > > > > > > > > > > I checked the same procedure on v24 patch. > > > > > But following error occurs instead of the original error. > > > > > > > > > > ERROR: relation "ivm_t_index" already exists > > > > > > > > Thank you for pointing out it! > > > > > > > > Hmmm, an index is created when IMMV is defined, so CREAE INDEX called > > > > after this would fail... Maybe, we should not create any index > > automatically > > > > if IMMV is created WITH NO DATA. > > > > > > > > I'll fix it after some investigation. > > > > > > Are you still investigating on that problem? Also, the patchset doesn't > > apply > > > anymore: > > > > I attached the updated and rebased patch set. > > > > I fixed to not create a unique index when an IMMV is created WITH NO DATA. > > Instead, the index is created by REFRESH WITH DATA only when the same one > > is not created yet. > > > > Also, I fixed the documentation to describe that foreign tables and > > partitioned > > tables are not supported according with Takahashi-san's suggestion. > > > > > There isn't any answer to your following email summarizing the feature > > yet, so > > > I'm not sure what should be the status of this patch, as there's no ideal > > > category for that. For now I'll change the patch to Waiting on Author > > on the > > > cf app, feel free to switch it back to Needs Review if you think it's > > more > > > suitable, at least for the design discussion need. > > > > I changed the status to Needs Review. > > > > > > Hi, > Did you intend to attach updated patch ? > > I don't seem to find any. Oops, I attached. Thanks! -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
- v25-0010-Add-documentations-about-Incremental-View-Mainte.patch
- v25-0009-Add-regression-tests-for-Incremental-View-Mainte.patch
- v25-0008-Add-aggregates-support-in-IVM.patch
- v25-0007-Add-Incremental-View-Maintenance-support.patch
- v25-0006-Add-Incremental-View-Maintenance-support-to-psql.patch
- v25-0005-Add-Incremental-View-Maintenance-support-to-pg_d.patch
- v25-0004-Allow-to-prolong-life-span-of-transition-tables-.patch
- v25-0003-Add-new-deptype-option-m-in-pg_depend-system-cat.patch
- v25-0002-Add-relisivm-column-to-pg_class-system-catalog.patch
- v25-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patch
On Thu, 3 Feb 2022 08:48:00 -0800
Zhihong Yu <zyu@yugabyte.com> wrote:
> On Thu, Feb 3, 2022 at 8:28 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
>
> > Hi,
> >
> > On Thu, 13 Jan 2022 18:23:42 +0800
> > Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > On Thu, Nov 25, 2021 at 04:37:10PM +0900, Yugo NAGATA wrote:
> > > > On Wed, 24 Nov 2021 04:31:25 +0000
> > > > "r.takahashi_2@fujitsu.com" <r.takahashi_2@fujitsu.com> wrote:
> > > >
> > > > >
> > > > > I checked the same procedure on v24 patch.
> > > > > But following error occurs instead of the original error.
> > > > >
> > > > > ERROR: relation "ivm_t_index" already exists
> > > >
> > > > Thank you for pointing out it!
> > > >
> > > > Hmmm, an index is created when IMMV is defined, so CREAE INDEX called
> > > > after this would fail... Maybe, we should not create any index
> > automatically
> > > > if IMMV is created WITH NO DATA.
> > > >
> > > > I'll fix it after some investigation.
> > >
> > > Are you still investigating on that problem? Also, the patchset doesn't
> > apply
> > > anymore:
> >
> > I attached the updated and rebased patch set.
> >
> > I fixed to not create a unique index when an IMMV is created WITH NO DATA.
> > Instead, the index is created by REFRESH WITH DATA only when the same one
> > is not created yet.
> >
> > Also, I fixed the documentation to describe that foreign tables and
> > partitioned
> > tables are not supported according with Takahashi-san's suggestion.
> >
> > > There isn't any answer to your following email summarizing the feature
> > yet, so
> > > I'm not sure what should be the status of this patch, as there's no ideal
> > > category for that. For now I'll change the patch to Waiting on Author
> > on the
> > > cf app, feel free to switch it back to Needs Review if you think it's
> > more
> > > suitable, at least for the design discussion need.
> >
> > I changed the status to Needs Review.
> >
> >
> > Hi,
> Did you intend to attach updated patch ?
>
> I don't seem to find any.
Oops, I attached. Thanks!
+ expression_tree_walker(node, check_ivm_restriction_walker, NULL);
+ break;
On Wed, 16 Feb 2022 22:34:18 +0800 huyajun <hu_yajun@qq.com> wrote: > Hi, Nagata-san > I am very interested in IMMV and read your patch but have some comments in v25-0007-Add-Incremental-View-Maintenance-support.patchand want to discuss with you. Thank you for your review! > > + /* For IMMV, we need to rewrite matview query */ > + query = rewriteQueryForIMMV(query, into->colNames); > + query_immv = copyObject(query); > > /* Create triggers on incremental maintainable materialized view */ > + Assert(query_immv != NULL); > + CreateIvmTriggersOnBaseTables(query_immv, matviewOid, true); > 1. Do we need copy query?Is it okay that CreateIvmTriggersOnBaseTables directly use (Query *) into->viewQuery insteadof query_immv like CreateIndexOnIMMV? It seems only planner may change query, but it shouldn't affect us findingthe correct base table in CreateIvmTriggersOnBaseTables . The copy to query_immv was necessary for supporting sub-queries in the view definition. However, we excluded the fueature from the current patch to reduce the patch size, so it would be unnecessary. I'll fix it. > > +void > +CreateIndexOnIMMV(Query *query, Relation matviewRel) > +{ > + Query *qry = (Query *) copyObject(query); > 2. Also, is it okay to not copy query in CreateIndexOnIMMV? It seems we only read query in CreateIndexOnIMMV. This was also necessary for supporting CTEs, but unnecessary in the current patch, so I'll fix it, too. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Hi, I attached the updated patch-set (v26). > On Wed, 16 Feb 2022 22:34:18 +0800 > huyajun <hu_yajun@qq.com> wrote: > > > Hi, Nagata-san > > I am very interested in IMMV and read your patch but have some comments in v25-0007-Add-Incremental-View-Maintenance-support.patchand want to discuss with you. > > Thank you for your review! > > > > > + /* For IMMV, we need to rewrite matview query */ > > + query = rewriteQueryForIMMV(query, into->colNames); > > + query_immv = copyObject(query); > > > > /* Create triggers on incremental maintainable materialized view */ > > + Assert(query_immv != NULL); > > + CreateIvmTriggersOnBaseTables(query_immv, matviewOid, true); > > 1. Do we need copy query?Is it okay that CreateIvmTriggersOnBaseTables directly use (Query *) into->viewQuery insteadof query_immv like CreateIndexOnIMMV? It seems only planner may change query, but it shouldn't affect us findingthe correct base table in CreateIvmTriggersOnBaseTables . > > The copy to query_immv was necessary for supporting sub-queries in the view > definition. However, we excluded the fueature from the current patch to reduce > the patch size, so it would be unnecessary. I'll fix it. > > > > > +void > > +CreateIndexOnIMMV(Query *query, Relation matviewRel) > > +{ > > + Query *qry = (Query *) copyObject(query); > > 2. Also, is it okay to not copy query in CreateIndexOnIMMV? It seems we only read query in CreateIndexOnIMMV. > > This was also necessary for supporting CTEs, but unnecessary in the current > patch, so I'll fix it, too. I removed unnecessary copies of Query in according with the suggestions from huyajun, and fix wrong codes in a "switch" statement pointed out by Zhihong Yu. In addition, I made the following fixes: - Fix psql tab-completion code according with master branch - Fix auto-index-creation that didn't work well in REFRESH command - Add documentation description about the automatic index creation Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
- v26-0010-Add-documentations-about-Incremental-View-Mainte.patch
- v26-0009-Add-regression-tests-for-Incremental-View-Mainte.patch
- v26-0008-Add-aggregates-support-in-IVM.patch
- v26-0007-Add-Incremental-View-Maintenance-support.patch
- v26-0006-Add-Incremental-View-Maintenance-support-to-psql.patch
- v26-0005-Add-Incremental-View-Maintenance-support-to-pg_d.patch
- v26-0004-Allow-to-prolong-life-span-of-transition-tables-.patch
- v26-0003-Add-new-deptype-option-m-in-pg_depend-system-cat.patch
- v26-0002-Add-relisivm-column-to-pg_class-system-catalog.patch
- v26-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patch
Hello Zhihong Yu, I already replied to your comments before, but I forgot to include the list to CC, so I resend the same again. Sorry for the duplicate emails. On Thu, 3 Feb 2022 09:51:52 -0800 Zhihong Yu <zyu@yugabyte.com> wrote: > For CreateIndexOnIMMV(): > > + ereport(NOTICE, > + (errmsg("could not create an index on materialized view > \"%s\" automatically", > ... > + return; > + } > > Should the return type be changed to bool so that the caller knows whether > the index creation succeeds ? > If index creation is unsuccessful, should the call > to CreateIvmTriggersOnBaseTables() be skipped ? CreateIvmTriggersOnBaseTables() have to be called regardless of whether an index is created successfully or not, so I think CreateindexOnIMMV() doesn't have to return the result for now. > For check_ivm_restriction_walker(): > > + break; > + expression_tree_walker(node, check_ivm_restriction_walker, > NULL); > + break; > > Something is missing between the break and expression_tree_walker(). Yes, it's my mistake during making the patch-set. I fixed it in the updated patch I attached in the other post. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
This patch has bitrotted due to some other patch affecting trigger.c. Could you post a rebase? This is the last week of the CF before feature freeze so time is of the essence.
Hi, On Fri, 1 Apr 2022 11:09:16 -0400 Greg Stark <stark@mit.edu> wrote: > This patch has bitrotted due to some other patch affecting trigger.c. > > Could you post a rebase? > > This is the last week of the CF before feature freeze so time is of the essence. I attached a rebased patch-set. Also, I made the folowing changes from the previous. 1. Fix to not use a new deptye In the previous patch, we introduced a new deptye 'm' into pg_depend. This deptype was used for looking for IVM triggers to be removed at REFRESH WITH NO DATA. However, we decided to not use it for reducing unnecessary change in the core code. Currently, the trigger name and dependent objclass are used at that time instead of it. As a result, the number of patches are reduced to nine from ten. 2. Bump the version numbers in psql and pg_dump This feature's target is PG 16 now. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
- v27-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patch
- v27-0002-Add-relisivm-column-to-pg_class-system-catalog.patch
- v27-0003-Allow-to-prolong-life-span-of-transition-tables-.patch
- v27-0004-Add-Incremental-View-Maintenance-support-to-pg_d.patch
- v27-0005-Add-Incremental-View-Maintenance-support-to-psql.patch
- v27-0006-Add-Incremental-View-Maintenance-support.patch
- v27-0007-Add-aggregates-support-in-IVM.patch
- v27-0008-Add-regression-tests-for-Incremental-View-Mainte.patch
- v27-0009-Add-documentations-about-Incremental-View-Mainte.patch
On Fri, 22 Apr 2022 11:29:39 +0900 Yugo NAGATA <nagata@sraoss.co.jp> wrote: > Hi, > > On Fri, 1 Apr 2022 11:09:16 -0400 > Greg Stark <stark@mit.edu> wrote: > > > This patch has bitrotted due to some other patch affecting trigger.c. > > > > Could you post a rebase? > > > > This is the last week of the CF before feature freeze so time is of the essence. > > I attached a rebased patch-set. > > Also, I made the folowing changes from the previous. > > 1. Fix to not use a new deptye > > In the previous patch, we introduced a new deptye 'm' into pg_depend. > This deptype was used for looking for IVM triggers to be removed at > REFRESH WITH NO DATA. However, we decided to not use it for reducing > unnecessary change in the core code. Currently, the trigger name and > dependent objclass are used at that time instead of it. > > As a result, the number of patches are reduced to nine from ten. > 2. Bump the version numbers in psql and pg_dump > > This feature's target is PG 16 now. Sorry, I revert this change. It was too early to bump up the version number. -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
- v27-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patch
- v27-0002-Add-relisivm-column-to-pg_class-system-catalog.patch
- v27-0003-Allow-to-prolong-life-span-of-transition-tables-.patch
- v27-0004-Add-Incremental-View-Maintenance-support-to-pg_d.patch
- v27-0005-Add-Incremental-View-Maintenance-support-to-psql.patch
- v27-0006-Add-Incremental-View-Maintenance-support.patch
- v27-0007-Add-aggregates-support-in-IVM.patch
- v27-0008-Add-regression-tests-for-Incremental-View-Mainte.patch
- v27-0009-Add-documentations-about-Incremental-View-Mainte.patch
On Fri, 22 Apr 2022 11:29:39 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> Hi,
>
> On Fri, 1 Apr 2022 11:09:16 -0400
> Greg Stark <stark@mit.edu> wrote:
>
> > This patch has bitrotted due to some other patch affecting trigger.c.
> >
> > Could you post a rebase?
> >
> > This is the last week of the CF before feature freeze so time is of the essence.
>
> I attached a rebased patch-set.
>
> Also, I made the folowing changes from the previous.
>
> 1. Fix to not use a new deptye
>
> In the previous patch, we introduced a new deptye 'm' into pg_depend.
> This deptype was used for looking for IVM triggers to be removed at
> REFRESH WITH NO DATA. However, we decided to not use it for reducing
> unnecessary change in the core code. Currently, the trigger name and
> dependent objclass are used at that time instead of it.
>
> As a result, the number of patches are reduced to nine from ten.
> 2. Bump the version numbers in psql and pg_dump
>
> This feature's target is PG 16 now.
Sorry, I revert this change. It was too early to bump up the
version number.
--
Yugo NAGATA <nagata@sraoss.co.jp>
Hello Greg, On Sat, 23 Apr 2022 08:18:01 +0200 Greg Stark <stark@mit.edu> wrote: > I'm trying to figure out how to get this feature more attention. Everyone > agrees it would be a huge help but it's a scary patch to review. > > I wonder if it would be helpful to have a kind of "readers guide" > explanation of the patches to help a reviewer understand what the point of > each patch is and how the whole system works? I think Andres and Robert > have both taken that approach before with big patches and it really helped > imho. Thank you very much for your suggestion! Following your advice, I am going to write a readers guide referring to the past posts of Andres and Rebert. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
2022年4月22日 下午1:58,Yugo NAGATA <nagata@sraoss.co.jp> 写道:On Fri, 22 Apr 2022 11:29:39 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:Hi,
On Fri, 1 Apr 2022 11:09:16 -0400
Greg Stark <stark@mit.edu> wrote:This patch has bitrotted due to some other patch affecting trigger.c.
Could you post a rebase?
This is the last week of the CF before feature freeze so time is of the essence.
I attached a rebased patch-set.
Also, I made the folowing changes from the previous.
1. Fix to not use a new deptye
In the previous patch, we introduced a new deptye 'm' into pg_depend.
This deptype was used for looking for IVM triggers to be removed at
REFRESH WITH NO DATA. However, we decided to not use it for reducing
unnecessary change in the core code. Currently, the trigger name and
dependent objclass are used at that time instead of it.
As a result, the number of patches are reduced to nine from ten.2. Bump the version numbers in psql and pg_dump
This feature's target is PG 16 now.
Sorry, I revert this change. It was too early to bump up the
version number.
--
Yugo NAGATA <nagata@sraoss.co.jp>
<v27-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patch><v27-0002-Add-relisivm-column-to-pg_class-system-catalog.patch><v27-0003-Allow-to-prolong-life-span-of-transition-tables-.patch><v27-0004-Add-Incremental-View-Maintenance-support-to-pg_d.patch><v27-0005-Add-Incremental-View-Maintenance-support-to-psql.patch><v27-0006-Add-Incremental-View-Maintenance-support.patch><v27-0007-Add-aggregates-support-in-IVM.patch><v27-0008-Add-regression-tests-for-Incremental-View-Mainte.patch><v27-0009-Add-documentations-about-Incremental-View-Mainte.patch>
Hi huyajun, Thank you for your comments! On Wed, 29 Jun 2022 17:56:39 +0800 huyajun <hu_yajun@qq.com> wrote: > Hi, Nagata-san > I read your patch with v27 version and has some new comments,I want to discuss with you. > > 1. How about use DEPENDENCY_INTERNAL instead of DEPENDENCY_AUTO > when record dependence on trigger created by IMV.( related code is in the end of CreateIvmTrigger) > Otherwise, User can use sql to drop trigger and corrupt IVM, DEPENDENCY_INTERNAL is also semantically more correct > Crash case like: > create table t( a int); > create incremental materialized view s as select * from t; > drop trigger "IVM_trigger_XXXX”; > Insert into t values(1); We use DEPENDENCY_AUTO because we want to delete the triggers when REFRESH ... WITH NO DATA is performed on the materialized view in order to disable IVM. Triggers created with DEPENDENCY_INTERNAL cannot be dropped. Such triggers are re-created when REFRESH ... [WITH DATA] is performed. We can use DEPENDENCY_INTERNAL if we disable/enable such triggers instead of dropping/re-creating them, although users also can disable triggers using ALTER TRIGGER. > 2. In get_matching_condition_string, Considering NULL values, we can not use simple = operator. > But how about 'record = record', record_eq treat NULL = NULL > it should fast than current implementation for only one comparation > Below is my simple implementation with this, Variables are named arbitrarily.. > I test some cases it’s ok > > static char * > get_matching_condition_string(List *keys) > { > StringInfoData match_cond; > ListCell *lc; > > /* If there is no key columns, the condition is always true. */ > if (keys == NIL) > return "true"; > else > { > StringInfoData s1; > StringInfoData s2; > initStringInfo(&match_cond); > initStringInfo(&s1); > initStringInfo(&s2); > /* Considering NULL values, we can not use simple = operator. */ > appendStringInfo(&s1, "ROW("); > appendStringInfo(&s2, "ROW("); > foreach (lc, keys) > { > Form_pg_attribute attr = (Form_pg_attribute) lfirst(lc); > char *resname = NameStr(attr->attname); > char *mv_resname = quote_qualified_identifier("mv", resname); > char *diff_resname = quote_qualified_identifier("diff", resname); > > appendStringInfo(&s1, "%s", mv_resname); > appendStringInfo(&s2, "%s", diff_resname); > > if (lnext(lc)) > { > appendStringInfo(&s1, ", "); > appendStringInfo(&s2, ", "); > } > } > appendStringInfo(&s1, ")::record"); > appendStringInfo(&s2, ")::record"); > appendStringInfo(&match_cond, "%s operator(pg_catalog.=) %s", s1.data, s2.data); > return match_cond.data; > } > } As you say, we don't have to use IS NULL if we use ROW(...)::record, but we cannot use an index in this case and it makes IVM ineffecient. As showed bellow (#5), an index works even when we use simple = operations together with together "IS NULL" operations. > 3. Consider truncate base tables, IVM will not refresh, maybe raise an error will be better I fixed to support TRUNCATE on base tables in our repository. https://github.com/sraoss/pgsql-ivm/commit/a1365ed69f34e1adbd160f2ce8fd1e80e032392f When a base table is truncated, the view content will be empty if the view definition query does not contain an aggregate without a GROUP clause. Therefore, such views can be truncated. Aggregate views without a GROUP clause always have one row. Therefore, if a base table is truncated, the view will not be empty and will contain a row with NULL value (or 0 for count()). So, in this case, we refresh the view instead of truncating it. The next version of the patch-set will include this change. > 4. In IVM_immediate_before,I know Lock base table with ExclusiveLock is > for concurrent updates to the IVM correctly, But how about to Lock it when actually > need to maintain MV which in IVM_immediate_maintenance > In this way you don't have to lock multiple times. Yes, as you say, we don't have to lock the view multiple times. I'll investigate better locking ways including the way that you suggest. > 5. Why we need CreateIndexOnIMMV, is it a optimize? > It seems like when maintenance MV, > the index may not be used because of our match conditions can’t use simple = operator No, the index works even when we use simple = operator together with "IS NULL". For example: postgres=# \d mv Materialized view "public.mv" Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+--------- id | integer | | | v1 | integer | | | v2 | integer | | | Indexes: "mv_index" UNIQUE, btree (id) NULLS NOT DISTINCT postgres=# EXPLAIN ANALYZE WITH diff(id, v1, v2) AS MATERIALIZED ((VALUES(42, 420, NULL::int))) SELECT mv.* FROM mv, diff WHERE (mv.id = diff.id OR (mv.id IS NULL AND diff.id IS NULL)) AND (mv.v1 = diff.v1 OR (mv.v1 IS NULL AND diff.v1 IS NULL)) AND (mv.v2 = diff.v2 OR (mv.v2 IS NULL AND diff.v2 IS NULL)); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------- Nested Loop (cost=133.87..137.92 rows=1 width=12) (actual time=0.180..0.191 rows=1 loops=1) CTE diff -> Result (cost=0.00..0.01 rows=1 width=12) (actual time=0.027..0.028 rows=1 loops=1) -> CTE Scan on diff (cost=0.00..0.02 rows=1 width=12) (actual time=0.037..0.040 rows=1 loops=1) -> Bitmap Heap Scan on mv (cost=133.86..137.88 rows=1 width=12) (actual time=0.127..0.132 rows=1 loops=1) Recheck Cond: ((id = diff.id) OR (id IS NULL)) Filter: (((id = diff.id) OR ((id IS NULL) AND (diff.id IS NULL))) AND ((v1 = diff.v1) OR ((v1 IS NULL) AND (diff.v1IS NULL))) AND ((v2 = diff.v2) OR ((v2 IS NULL) AND (diff.v2 IS NULL)))) Heap Blocks: exact=1 -> BitmapOr (cost=133.86..133.86 rows=1 width=0) (actual time=0.091..0.093 rows=0 loops=1) -> Bitmap Index Scan on mv_index (cost=0.00..4.43 rows=1 width=0) (actual time=0.065..0.065 rows=1 loops=1) Index Cond: (id = diff.id) -> Bitmap Index Scan on mv_index (cost=0.00..4.43 rows=1 width=0) (actual time=0.021..0.021 rows=0 loops=1) Index Cond: (id IS NULL) Planning Time: 0.666 ms Execution Time: 0.399 ms (15 rows) Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
3. Consider truncate base tables, IVM will not refresh, maybe raise an error will be better
I fixed to support TRUNCATE on base tables in our repository.
https://github.com/sraoss/pgsql-ivm/commit/a1365ed69f34e1adbd160f2ce8fd1e80e032392f
When a base table is truncated, the view content will be empty if the
view definition query does not contain an aggregate without a GROUP clause.
Therefore, such views can be truncated.
Aggregate views without a GROUP clause always have one row. Therefore,
if a base table is truncated, the view will not be empty and will contain
a row with NULL value (or 0 for count()). So, in this case, we refresh the
view instead of truncating it.
The next version of the patch-set will include this change.
Hello huyajun, I'm sorry for delay in my response. On Tue, 26 Jul 2022 12:00:26 +0800 huyajun <hu_yajun@qq.com> wrote: > I read your patch and think this processing is greet, but there is a risk of deadlock. > Although I have not thought of a suitable processing method for the time being, > it is also acceptable for truncate scenarios.The deadlock scene is as follows: > > Mv define is: select * from base_a,base_b; > S1: truncate base_a; ― only AccessExclusiveLock base_a and not run into after trigger > S2: insert into base_b; ― The update has been completed and the incremental refresh is started in the after trigger,RowExclusiveon base_b and ExclusiveLock on mv > S1: continue truncate mv, wait for AccessExclusiveLock on mv, wait for S2 > S2: continue refresh mv, wait for AccessShardLock on base_a, wait for S1 > So deadlock occurred Hmm, this deadlock scenario is possible, indeed. One idea to resolve it is to acquire RowExclusive locks on all base tables in the BEFORE trigger. If so, S2 can not progress its process because it waits for a RowExclusive lock on base_b, and it can not acquire ExeclusiveLock on mv before S1 finishes. > I also found some new issues that I would like to discuss with you Thank you so much for your massive bug reports! > 1. Concurrent DML causes imv data error, case like below > Setup: > Create table t( a int); > Insert into t select 1 from generate_series(1,3); > create incremental materialized view s as select count(*) from t; > > S1: begin;delete from t where ctid in (select ctid from t limit 1); > S2: begin;delete from t where ctid in (select ctid from t limit 1 offset 1); > S1: commit; > S2: commit; > > After this, The count data of s becomes 2 but correct data is 1. > I found out that the problem is probably because to our use of ctid update > Consider user behavior unrelated to imv: > > Create table t( a int); > Insert into t select 1; > s1: BEGIN > s1: update t set a = 2 where ctid in (select ctid from t); -- UPDATE 1 > s2: BEGIN > s2: update t set a = 3 where ctid in (select ctid from t); -- wait row lock > s1: COMMIT > s2: -- UPDATE 0 -- ctid change so can't UPDATE one rows > So we lost the s2 update > > 2. Sometimes it will crash when the columns of the created materialized view do not match > Create table t( a int); > create incremental materialized view s(z) as select sum(1) as a, sum(1) as b from t; > > The problem should be that colNames in rewriteQueryForIMMV does not consider this situation > > 3. Sometimes no error when the columns of the created materialized view do not match > Create table t( a int); > create incremental materialized view s(y,z) as select count(1) as b from t; > > But the hidden column of IMV is overwritten to z which will cause refresh failed. > > The problem should be that checkRuleResultList we should only skip imv hidden columns check > > 4. A unique index should not be created in the case of a Cartesian product > > create table base_a (i int primary key, j varchar); > create table base_b (i int primary key, k varchar); > INSERT INTO base_a VALUES > (1,10), > (2,20), > (3,30), > (4,40), > (5,50); > INSERT INTO base_b VALUES > (1,101), > (2,102), > (3,103), > (4,104); > CREATE incremental MATERIALIZED VIEW s as > select base_a.i,base_a.j from base_a,base_b; ― create error because of unique index I am working on above issues (#1-#4) now, and I'll respond on each later. > 5. Besides, I would like to ask you if you have considered implementing an IMV with delayed refresh? > The advantage of delayed refresh is that it will not have much impact on write performance Yes, I've been thinking to implement deferred maintenance since the beginning of this IVM project. However, we've decided to start from immediate maintenance, and will plan to propose deferred maintenance to the core after the current patch is accepted. (I plan to implement this feature in pg_ivm extension module first, though.) > I probably have some ideas about it now, do you think it works? > 1. After the base table is updated, the delayed IMV's after trigger is used to record the delta > information in another table similar to the incremental log of the base table > 2. When incremental refresh, use the data in the log instead of the data in the trasient table > of the after trigger > 3. We need to merge the incremental information in advance to ensure that the base_table > after transaction filtering UNION ALL old_delta is the state before the base table is updated > Case like below: > Create table t( a int); > ―begin to record log > Insert into t select 1; ― newlog: 1 oldlog: empty > Delete from t; ―newlog:1, oldlog:1 > ― begin to incremental refresh > Select * from t where xmin < xid or (xmin = xid and cmin < cid); ― empty > So this union all oldlog is not equal to before the base table is updated > We need merge the incremental log in advance to make newlog: empty, oldlog: empty > > If implemented, incremental refresh must still be serialized, but the DML of the base table > can not be blocked, that is to say, the base table can still record logs during incremental refresh, > as long as we use same snapshot when incrementally updating. > > do you think there will be any problems with this solution? I guess the deferred maintenance process would be basically what similar to above. Especially, as you say, we need to merge incremental information in some way before calculating deltas on the view. I investigated some research papers, but I'll review again before working on deferred approach design. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
On Fri, Sep 09, 2022 at 08:10:32PM +0900, Yugo NAGATA wrote: > I am working on above issues (#1-#4) now, and I'll respond on each later. Okay, well. There has been some feedback sent lately and no update for one month, so I am marking it as RwF for now. As a whole the patch has been around for three years and it does not seem that a lot has happened in terms of design discussion (now the thread is long so I would easily miss something).. -- Michael