Re: Incrementally refreshed materialized view - Mailing list pgsql-general

From Nguyễn Trần Quốc Vinh
Subject Re: Incrementally refreshed materialized view
Date
Msg-id CAAU9oxuy9EQiCPDuT=eNjpJPq0iuBArptXDNQnOfHGpBhP9+ow@mail.gmail.com
Whole thread Raw
In response to Re: Incrementally refreshed materialized view  (hariprasath nallasamy <hariprasathnallasamy@gmail.com>)
Responses Re: Incrementally refreshed materialized view
List pgsql-general
Dear,

As it was recommended, we pushed our projects into github: https://github.com/ntqvinh/PgMvIncrementalUpdate. 

1) Synchronous incremental update
- For-each-row triggers are generated for all changing events on all underlying tables.

2) Asynchronous (deferred) incremental update
- Triggers are generated for all changing events on all underlying tables to collect all changed rows
- Other codes are generated for each matview-query to update the matview.

 We hope that our projects may be helpful for someone!




TS. Nguyễn Trần Quốc Vinh
-----------------------------------------------
Chủ nhiệm khoa Tin học
Trường ĐH Sư phạm - ĐH Đà Nẵng
ĐT: 0511.6-512-586
DĐ: 0914.78-08-98
------------------------------------------------
Nguyen Tran Quoc Vinh, PhD
Dean
Faculty of Information Technology
Danang University of Education
Phone: (+84) 511.6-512-586
Mobile: (+84) 914.78-08-98

On Tue, Sep 27, 2016 at 12:23 PM, hariprasath nallasamy <hariprasathnallasamy@gmail.com> wrote:
We also tried to achieve incremental refresh of materialized view and our solution doesn't solve all of the use cases.

Players:
1) WAL
2) Logical decoding 
3) replication slots 
4) custom background worker

Two kinds of approaches :
1. Deferred refresh (oracle type of creating log table for each base tables with its PK and agg's columns old and new values)
      a) Log table for each base table has to be created and this log table will keep track of delta changes. 
      b) UDF is called to refresh the view incrementally - this will run original materialized view query with the tracked delta PK's in their where clause. so only rows that are modified/inserted will be touched.
      c) Log table will keep track of changed rows from the data given by replication slot which uses logical decoding to decode from WAL.
      d) Shared memory is used to maintain the relationship between the view and its base table. In case of restart they are pushed to maintenance table.

2. RealTime refresh (update the view whenever we get any change-sets related to that base tables)
      a) Delta data from the replication slot will be applied to view by checking the relationship between our delta data and the view definiton. Here also shared memory and maintenance table are used.
      b) Work completed only for materialized views having single table.

Main disadvantage : 
1) Data inconsistency when master failure and also slave doesn't have replication slot as of now. But 2ndquard guys try to create slots in slave using some concepts of failover slots. But that doesn't come along with PG :(. 
2) Sum, count and avg are implemented for aggregates(single table) and for other aggs full refresh comes to play a role.
3) Right join implementation requires more queries to run on the top of MV's.

So we are on a long way to go and dono whether this is the right path.

Only deferred refresh was pushed to github.

I wrote a post regarding that in medium.



pgsql-general by date:

Previous
From: Job
Date:
Subject: Out of memory in pg_bulkload
Next
From: Adam Brusselback
Date:
Subject: Re: Incrementally refreshed materialized view