Re: Implementing Incremental View Maintenance - Mailing list pgsql-hackers
From | Yugo Nagata |
---|---|
Subject | Re: Implementing Incremental View Maintenance |
Date | |
Msg-id | 20191223100716.af0796faef3a1bd94890b3a3@sraoss.co.jp Whole thread Raw |
In response to | Re: Implementing Incremental View Maintenance (nuko yokohama <nuko.yokohama@gmail.com>) |
List | pgsql-hackers |
On Sun, 22 Dec 2019 20:54:41 +0900 nuko yokohama <nuko.yokohama@gmail.com> wrote: > SELECT statement that is not IMMUTABLE must not be specified when creating > a view. > > An expression SELECT statement that is not IMMUTABLE must not be specified > when creating a view. > > In the current implementation, a SELECT statement containing an expression > that is not IMMUTABLE can be specified when creating a view. > If an incremental materialized view is created from a SELECT statement that > contains an expression that is not IMMUTABLE, applying the SELECT statement > to the view returns incorrect results. > To prevent this, we propose that the same error occur when a non-IMMUTABLE > expression is specified in the "CREATE INDEX" statement. Thank you for pointing out this. That makes sense. The check of not-IMMUTABLE epressions is missing at creating IMMV. We'll add this. Thanks, Yugo Nagata > > The following is an inappropriate example. > ---- > CREATE TABLE base (id int primary key, data text, ts timestamp); > CREATE TABLE > CREATE VIEW base_v AS SELECT * FROM base > WHERE ts >= (now() - '3 second'::interval); > CREATE VIEW > CREATE MATERIALIZED VIEW base_mv AS SELECT * FROM base > WHERE ts >= (now() - '3 second'::interval); > SELECT 0 > CREATE INCREMENTAL MATERIALIZED VIEW base_imv AS SELECT * FROM base > WHERE ts >= (now() - '3 second'::interval); > SELECT 0 > View "public.base_v" > Column | Type | Collation | Nullable | Default | > Storage | Description > --------+-----------------------------+-----------+----------+---------+----------+------------- > id | integer | | | | > plain | > data | text | | | | > extended | > ts | timestamp without time zone | | | | > plain | > View definition: > SELECT base.id, > base.data, > base.ts > FROM base > WHERE base.ts >= (now() - '00:00:03'::interval); > > Materialized view "public.base_mv" > Column | Type | Collation | Nullable | Default | > Storage | Stats target | Description > --------+-----------------------------+-----------+----------+---------+----------+--------------+------------- > id | integer | | | | > plain | | > data | text | | | | > extended | | > ts | timestamp without time zone | | | | > plain | | > View definition: > SELECT base.id, > base.data, > base.ts > FROM base > WHERE base.ts >= (now() - '00:00:03'::interval); > Access method: heap > > Materialized view "public.base_imv" > Column | Type | Collation | Nullable | > Default | Storage | Stats target | Description > ---------------+-----------------------------+-----------+----------+---------+----------+--------------+------------- > id | integer | | | > | plain | | > data | text | | | > | extended | | > ts | timestamp without time zone | | | > | plain | | > __ivm_count__ | bigint | | | > | plain | | > View definition: > SELECT base.id, > base.data, > base.ts > FROM base > WHERE base.ts >= (now() - '00:00:03'::interval); > Access method: heap > Incremental view maintenance: yes > > INSERT INTO base VALUES (generate_series(1,3), 'dummy', clock_timestamp()); > INSERT 0 3 > SELECT * FROM base_v ORDER BY id; > id | data | ts > ----+-------+---------------------------- > 1 | dummy | 2019-12-22 11:38:26.367481 > 2 | dummy | 2019-12-22 11:38:26.367599 > 3 | dummy | 2019-12-22 11:38:26.367606 > (3 rows) > > SELECT * FROM base_mv ORDER BY id; > id | data | ts > ----+------+---- > (0 rows) > > REFRESH MATERIALIZED VIEW base_mv; > REFRESH MATERIALIZED VIEW > SELECT * FROM base_mv ORDER BY id; > id | data | ts > ----+-------+---------------------------- > 1 | dummy | 2019-12-22 11:38:26.367481 > 2 | dummy | 2019-12-22 11:38:26.367599 > 3 | dummy | 2019-12-22 11:38:26.367606 > (3 rows) > > SELECT * FROM base_imv ORDER BY id; > id | data | ts > ----+-------+---------------------------- > 1 | dummy | 2019-12-22 11:38:26.367481 > 2 | dummy | 2019-12-22 11:38:26.367599 > 3 | dummy | 2019-12-22 11:38:26.367606 > (3 rows) > > SELECT pg_sleep(3); > pg_sleep > ---------- > > (1 row) > > INSERT INTO base VALUES (generate_series(4,6), 'dummy', clock_timestamp()); > INSERT 0 3 > SELECT * FROM base_v ORDER BY id; > id | data | ts > ----+-------+---------------------------- > 4 | dummy | 2019-12-22 11:38:29.381414 > 5 | dummy | 2019-12-22 11:38:29.381441 > 6 | dummy | 2019-12-22 11:38:29.381444 > (3 rows) > > SELECT * FROM base_mv ORDER BY id; > id | data | ts > ----+-------+---------------------------- > 1 | dummy | 2019-12-22 11:38:26.367481 > 2 | dummy | 2019-12-22 11:38:26.367599 > 3 | dummy | 2019-12-22 11:38:26.367606 > (3 rows) > > REFRESH MATERIALIZED VIEW base_mv; > REFRESH MATERIALIZED VIEW > SELECT * FROM base_mv ORDER BY id; > id | data | ts > ----+-------+---------------------------- > 4 | dummy | 2019-12-22 11:38:29.381414 > 5 | dummy | 2019-12-22 11:38:29.381441 > 6 | dummy | 2019-12-22 11:38:29.381444 > (3 rows) > > SELECT * FROM base_imv ORDER BY id; > id | data | ts > ----+-------+---------------------------- > 1 | dummy | 2019-12-22 11:38:26.367481 > 2 | dummy | 2019-12-22 11:38:26.367599 > 3 | dummy | 2019-12-22 11:38:26.367606 > 4 | dummy | 2019-12-22 11:38:29.381414 > 5 | dummy | 2019-12-22 11:38:29.381441 > 6 | dummy | 2019-12-22 11:38:29.381444 > (6 rows) > > REFRESH MATERIALIZED VIEW base_mv; > REFRESH MATERIALIZED VIEW > SELECT * FROM base_imv ORDER BY id; > id | data | ts > ----+-------+---------------------------- > 1 | dummy | 2019-12-22 11:38:26.367481 > 2 | dummy | 2019-12-22 11:38:26.367599 > 3 | dummy | 2019-12-22 11:38:26.367606 > 4 | dummy | 2019-12-22 11:38:29.381414 > 5 | dummy | 2019-12-22 11:38:29.381441 > 6 | dummy | 2019-12-22 11:38:29.381444 > (6 rows) > ---- > > 2018年12月27日(木) 21:57 Yugo Nagata <nagata@sraoss.co.jp>: > > > Hi, > > > > I would like to implement Incremental View Maintenance (IVM) on > > PostgreSQL. > > IVM is a technique to maintain materialized views which computes and > > applies > > only the incremental changes to the materialized views rather than > > recomputate the contents as the current REFRESH command does. > > > > I had a presentation on our PoC implementation of IVM at PGConf.eu 2018 > > [1]. > > Our implementation uses row OIDs to compute deltas for materialized > > views. > > The basic idea is that if we have information about which rows in base > > tables > > are contributing to generate a certain row in a matview then we can > > identify > > the affected rows when a base table is updated. This is based on an idea of > > Dr. Masunaga [2] who is a member of our group and inspired from ID-based > > approach[3]. > > > > In our implementation, the mapping of the row OIDs of the materialized view > > and the base tables are stored in "OID map". When a base relation is > > modified, > > AFTER trigger is executed and the delta is recorded in delta tables using > > the transition table feature. The accual udpate of the matview is triggerd > > by REFRESH command with INCREMENTALLY option. > > > > However, we realize problems of our implementation. First, WITH OIDS will > > be removed since PG12, so OIDs are no longer available. Besides this, it > > would > > be hard to implement this since it needs many changes of executor nodes to > > collect base tables's OIDs during execuing a query. Also, the cost of > > maintaining > > OID map would be high. > > > > For these reasons, we started to think to implement IVM without relying on > > OIDs > > and made a bit more surveys. > > > > We also looked at Kevin Grittner's discussion [4] on incremental matview > > maintenance. In this discussion, Kevin proposed to use counting algorithm > > [5] > > to handle projection views (using DISTNICT) properly. This algorithm need > > an > > additional system column, count_t, in materialized views and delta tables > > of > > base tables. > > > > However, the discussion about IVM is now stoped, so we would like to > > restart and > > progress this. > > > > > > Through our PoC inplementation and surveys, I think we need to think at > > least > > the followings for implementing IVM. > > > > 1. How to extract changes on base tables > > > > I think there would be at least two approaches for it. > > > > - Using transition table in AFTER triggers > > - Extracting changes from WAL using logical decoding > > > > In our PoC implementation, we used AFTER trigger and transition tables, > > but using > > logical decoding might be better from the point of performance of base > > table > > modification. > > > > If we can represent a change of UPDATE on a base table as query-like > > rather than > > OLD and NEW, it may be possible to update the materialized view directly > > instead > > of performing delete & insert. > > > > > > 2. How to compute the delta to be applied to materialized views > > > > Essentially, IVM is based on relational algebra. Theorically, changes on > > base > > tables are represented as deltas on this, like "R <- R + dR", and the > > delta on > > the materialized view is computed using base table deltas based on "change > > propagation equations". For implementation, we have to derive the > > equation from > > the view definition query (Query tree, or Plan tree?) and describe this as > > SQL > > query to compulte delta to be applied to the materialized view. > > > > There could be several operations for view definition: selection, > > projection, > > join, aggregation, union, difference, intersection, etc. If we can > > prepare a > > module for each operation, it makes IVM extensable, so we can start a > > simple > > view definition, and then support more complex views. > > > > > > 3. How to identify rows to be modifed in materialized views > > > > When applying the delta to the materialized view, we have to identify > > which row > > in the matview is corresponding to a row in the delta. A naive method is > > matching > > by using all columns in a tuple, but clearly this is unefficient. If > > thematerialized > > view has unique index, we can use this. Maybe, we have to force > > materialized views > > to have all primary key colums in their base tables. In our PoC > > implementation, we > > used OID to identify rows, but this will be no longer available as said > > above. > > > > > > 4. When to maintain materialized views > > > > There are two candidates of the timing of maintenance, immediate (eager) > > or deferred. > > > > In eager maintenance, the materialized view is updated in the same > > transaction > > where the base table is updated. In deferred maintenance, this is done > > after the > > transaction is commited, for example, when view is accessed, as a response > > to user > > request, etc. > > > > In the previous discussion[4], it is planned to start from "eager" > > approach. In our PoC > > implementaion, we used the other aproach, that is, using REFRESH command > > to perform IVM. > > I am not sure which is better as a start point, but I begin to think that > > the eager > > approach may be more simple since we don't have to maintain base table > > changes in other > > past transactions. > > > > In the eager maintenance approache, we have to consider a race condition > > where two > > different transactions change base tables simultaneously as discussed in > > [4]. > > > > > > [1] > > https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/ > > [2] > > https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1 > > (Japanese only) > > [3] https://dl.acm.org/citation.cfm?id=2750546 > > [4] > > https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com > > [5] https://dl.acm.org/citation.cfm?id=170066 > > > > Regards, > > -- > > Yugo Nagata <nagata@sraoss.co.jp> > > > > -- Yugo Nagata <nagata@sraoss.co.jp>
pgsql-hackers by date: