Re: Implementing Incremental View Maintenance - Mailing list pgsql-hackers
From | Yugo Nagata |
---|---|
Subject | Re: Implementing Incremental View Maintenance |
Date | |
Msg-id | 20190927171742.410211e0b6ab3f6dd9bd2287@sraoss.co.jp Whole thread Raw |
In response to | Re: Implementing Incremental View Maintenance (Paul Draper <paulddraper@gmail.com>) |
List | pgsql-hackers |
On Tue, 17 Sep 2019 12:03:20 -0600 Paul Draper <paulddraper@gmail.com> wrote: > Have you had any thoughts for more than two joined tables? > > Either there needs to be an quadratic number of joins, or intermediate join > results need to be stored and reused. I don't think that we need to store intermediate join results. Suppose that we have a view V joining table R,S, and new tuples are inserted to each table, dR,dS, and dT respectively. V = R*S*T R_new = R + dR S_new = S + dS T_new = T + dT In this situation, we can calculate the new view state as bellow. V_new = R_new * S_new * T_new = (R + dR) * (S + dS) * (T + dT) = R*S*T + dR*(S + dS)*(T + dT) + R*dS*(T + dT) + R*S*dT = V + dR*(S + dS)*(T + dT) + R*dS*(T + dT) + R*S*dT Although the number of terms is 2^3(=8), if we can use both of pre-update state (eg. R) and post-update state (eg. R+dR), we only need only three joins. Actually, post-update state is available in AFTER trigger, and pre-update state can be calculated by using delta tables (transition tables) and cmin/xmin system columns (or snapshot). This is the approach my implementation adopts. > > On Tue, Sep 17, 2019 at 8:50 AM Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > Hi Paul, > > > > Thank you for your suggestion. > > > > On Sun, 15 Sep 2019 11:52:22 -0600 > > Paul Draper <paulddraper@gmail.com> wrote: > > > > > As I understand it, the current patch performs immediate IVM using AFTER > > > STATEMENT trigger transition tables. > > > > > > However, multiple tables can be modified *before* AFTER STATEMENT > > triggers > > > are fired. > > > > > > CREATE TABLE example1 (a int); > > > CREATE TABLE example2 (a int); > > > > > > CREATE INCREMENTAL MATERIALIZED VIEW mv AS > > > SELECT example1.a, example2.a > > > FROM example1 JOIN example2 ON a; > > > > > > WITH > > > insert1 AS (INSERT INTO example1 VALUES (1)), > > > insert2 AS (INSERT INTO example2 VALUES (1)) > > > SELECT NULL; > > > > > > Changes to example1 are visible in an AFTER STATEMENT trigger on > > example2, > > > and vice versa. Would this not result in the (1, 1) tuple being > > > "double-counted"? > > > > > > IVM needs to either: > > > > > > (1) Evaluate deltas "serially' (e.g. EACH ROW triggers) > > > > > > (2) Have simultaneous access to multiple deltas: > > > delta_mv = example1 x delta_example2 + example2 x delta_example1 - > > > delta_example1 x delta_example2 > > > > > > This latter method is the "logged" approach that has been discussed for > > > deferred evaluation. > > > > > > tl;dr It seems that AFTER STATEMENT triggers required a deferred-like > > > implementation anyway. > > > > You are right, the latest patch doesn't support the situation where > > multiple tables are modified in a query. I noticed this when working > > on self-join, which also virtually need to handle multiple table > > modification. > > > > I am now working on this issue and the next patch will enable to handle > > this situation. I plan to submit the patch during this month. Roughly > > speaking, in the new implementation, AFTER STATEMENT triggers are used to > > collect information of modified table and its changes (= transition > > tables), > > and then the only last trigger updates the view. This will avoid the > > double-counting. I think this implementation also would be a base of > > deferred approach implementation in future where "logs" are used instead > > of transition tables. > > > > Regards, > > Yugo Nagata > > > > -- > > Yugo Nagata <nagata@sraoss.co.jp> > > -- Yugo Nagata <nagata@sraoss.co.jp>
pgsql-hackers by date: