Re: Incrementally refreshed materialized view - Mailing list pgsql-general

From hariprasath nallasamy
Subject Re: Incrementally refreshed materialized view
Date
Msg-id CAGgejVw9D8PMqd6qifDLmps6_JPV+9+Zm9WN14bAfmUPk==n5A@mail.gmail.com
Whole thread Raw
In response to Re: Incrementally refreshed materialized view  (Kevin Grittner <kgrittn@gmail.com>)
Responses Re: Incrementally refreshed materialized view  (Nguyễn Trần Quốc Vinh <ntquocvinh@gmail.com>)
List pgsql-general
We also tried to achieve incremental refresh of materialized view and our solution doesn't solve all of the use cases.

Players:
1) WAL
2) Logical decoding 
3) replication slots 
4) custom background worker

Two kinds of approaches :
1. Deferred refresh (oracle type of creating log table for each base tables with its PK and agg's columns old and new values)
      a) Log table for each base table has to be created and this log table will keep track of delta changes. 
      b) UDF is called to refresh the view incrementally - this will run original materialized view query with the tracked delta PK's in their where clause. so only rows that are modified/inserted will be touched.
      c) Log table will keep track of changed rows from the data given by replication slot which uses logical decoding to decode from WAL.
      d) Shared memory is used to maintain the relationship between the view and its base table. In case of restart they are pushed to maintenance table.

2. RealTime refresh (update the view whenever we get any change-sets related to that base tables)
      a) Delta data from the replication slot will be applied to view by checking the relationship between our delta data and the view definiton. Here also shared memory and maintenance table are used.
      b) Work completed only for materialized views having single table.

Main disadvantage : 
1) Data inconsistency when master failure and also slave doesn't have replication slot as of now. But 2ndquard guys try to create slots in slave using some concepts of failover slots. But that doesn't come along with PG :(. 
2) Sum, count and avg are implemented for aggregates(single table) and for other aggs full refresh comes to play a role.
3) Right join implementation requires more queries to run on the top of MV's.

So we are on a long way to go and dono whether this is the right path.

Only deferred refresh was pushed to github.

I wrote a post regarding that in medium.


pgsql-general by date:

Previous
From: raf
Date:
Subject: Frequent "pg_ctl status" removing(?) semaphores (unlikely)
Next
From: "Marek Petr"
Date:
Subject: lost synchronization with server: got message type "Z"