Re: Implementing Incremental View Maintenance - Mailing list pgsql-hackers

From Yugo Nagata
Subject Re: Implementing Incremental View Maintenance
Date
Msg-id 20190927171742.410211e0b6ab3f6dd9bd2287@sraoss.co.jp
Whole thread Raw
In response to Re: Implementing Incremental View Maintenance  (Paul Draper <paulddraper@gmail.com>)
List pgsql-hackers
On Tue, 17 Sep 2019 12:03:20 -0600
Paul Draper <paulddraper@gmail.com> wrote:

> Have you had any thoughts for more than two joined tables?
> 
> Either there needs to be an quadratic number of joins, or intermediate join
> results need to be stored and reused.

I don't think that we need to store intermediate join results.

Suppose that  we have a view V joining table R,S, and new tuples are inserted
to each table, dR,dS, and dT respectively.

 V = R*S*T
 R_new = R + dR
 S_new = S + dS
 T_new = T + dT

In this situation, we can calculate the new view state as bellow.

V_new 
= R_new * S_new * T_new
= (R + dR) * (S + dS) * (T + dT)
= R*S*T + dR*(S + dS)*(T + dT) + R*dS*(T + dT) + R*S*dT
= V + dR*(S + dS)*(T + dT) + R*dS*(T + dT) + R*S*dT

Although the number of terms is 2^3(=8), if we can use both of pre-update
state (eg. R) and post-update state (eg. R+dR), we only need only three joins.
Actually, post-update state is available in AFTER trigger, and pre-update state
can be calculated by using delta tables (transition tables) and cmin/xmin system
columns (or snapshot). This is the approach my implementation adopts.


> 
> On Tue, Sep 17, 2019 at 8:50 AM Yugo Nagata <nagata@sraoss.co.jp> wrote:
> 
> > Hi Paul,
> >
> > Thank you for your suggestion.
> >
> > On Sun, 15 Sep 2019 11:52:22 -0600
> > Paul Draper <paulddraper@gmail.com> wrote:
> >
> > > As I understand it, the current patch performs immediate IVM using AFTER
> > > STATEMENT trigger transition tables.
> > >
> > > However, multiple tables can be modified *before* AFTER STATEMENT
> > triggers
> > > are fired.
> > >
> > > CREATE TABLE example1 (a int);
> > > CREATE TABLE example2 (a int);
> > >
> > > CREATE INCREMENTAL MATERIALIZED VIEW mv AS
> > > SELECT example1.a, example2.a
> > > FROM example1 JOIN example2 ON a;
> > >
> > > WITH
> > >   insert1 AS (INSERT INTO example1 VALUES (1)),
> > >   insert2 AS (INSERT INTO example2 VALUES (1))
> > > SELECT NULL;
> > >
> > > Changes to example1 are visible in an AFTER STATEMENT trigger on
> > example2,
> > > and vice versa. Would this not result in the (1, 1) tuple being
> > > "double-counted"?
> > >
> > > IVM needs to either:
> > >
> > > (1) Evaluate deltas "serially' (e.g. EACH ROW triggers)
> > >
> > > (2) Have simultaneous access to multiple deltas:
> > > delta_mv = example1 x delta_example2 + example2 x delta_example1 -
> > > delta_example1 x delta_example2
> > >
> > > This latter method is the "logged" approach that has been discussed for
> > > deferred evaluation.
> > >
> > > tl;dr It seems that AFTER STATEMENT triggers required a deferred-like
> > > implementation anyway.
> >
> > You are right,  the latest patch doesn't support the situation where
> > multiple tables are modified in a query. I noticed this when working
> > on self-join, which also virtually need to handle multiple table
> > modification.
> >
> > I am now working on this issue and the next patch will enable to handle
> > this situation. I plan to submit the patch during this month. Roughly
> > speaking, in the new implementation, AFTER STATEMENT triggers are used to
> > collect  information of modified table and its changes (= transition
> > tables),
> > and then the only last trigger updates the view. This will avoid the
> > double-counting. I think this implementation also would be a base of
> > deferred approach implementation in future where "logs" are used instead
> > of transition tables.
> >
> > Regards,
> > Yugo Nagata
> >
> > --
> > Yugo Nagata <nagata@sraoss.co.jp>
> >


-- 
Yugo Nagata <nagata@sraoss.co.jp>



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: pg_wal/RECOVERYHISTORY file remains after archive recovery
Next
From: Michael Paquier
Date:
Subject: Re: [PATCH] get rid of StdRdOptions, use individual binaryreloptions representation for each relation kind instead