Re: Implementing Incremental View Maintenance - Mailing list pgsql-hackers

From nuko yokohama
Subject Re: Implementing Incremental View Maintenance
Date
Msg-id CAF3Gu1YyBSXT8rjJJK_MqNS02An=q6U8sCDR0PRUCaMwzocyNA@mail.gmail.com
Whole thread Raw
In response to Implementing Incremental View Maintenance  (Yugo Nagata <nagata@sraoss.co.jp>)
Responses Re: Implementing Incremental View Maintenance
List pgsql-hackers
Hi.

I found the problem after running "ALTER MATERIALIZED VIEW ... RENAME TO".
If a view created with "CREATE INCREMENT MATERIALIZED VIEW" is renamed,
subsequent INSERT operations to the base table will fail.

Error message.
```
ERROR:  could not open relation with OID 0
```

Execution log.
```
[ec2-user@ip-10-0-1-10 ivm]$ psql -U postgres test -e -f ~/test/ivm/alter_rename_bug.sql
DROP TABLE IF EXISTS table_x CASCADE;
psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:1: NOTICE:  drop cascades to materialized view group_imv
DROP TABLE
CREATE TABLE table_x AS
   SELECT generate_series(1, 10000) AS id,
   ROUND(random()::numeric * 100, 2) AS data,
   CASE (random() * 5)::integer
     WHEN 4 THEN 'group-a'
     WHEN 3 THEN 'group-b'
     ELSE 'group-c'
   END AS part_key
;
SELECT 10000
               Table "public.table_x"
  Column  |  Type   | Collation | Nullable | Default
----------+---------+-----------+----------+---------
 id       | integer |           |          |
 data     | numeric |           |          |
 part_key | text    |           |          |

DROP MATERIALIZED VIEW IF EXISTS group_imv;
psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:15: NOTICE:  materialized view "group_imv" does not exist, skipping
DROP MATERIALIZED VIEW
CREATE INCREMENTAL MATERIALIZED VIEW group_imv AS
SELECT part_key, COUNT(*), MAX(data), MIN(data), SUM(data), AVG(data)
FROM table_x
GROUP BY part_key;
SELECT 3
                 List of relations
 Schema |   Name    |       Type        |  Owner
--------+-----------+-------------------+----------
 public | group_imv | materialized view | postgres
 public | table_x   | table             | postgres
(2 rows)

             Materialized view "public.group_imv"
      Column       |  Type   | Collation | Nullable | Default
-------------------+---------+-----------+----------+---------
 part_key          | text    |           |          |
 count             | bigint  |           |          |
 max               | numeric |           |          |
 min               | numeric |           |          |
 sum               | numeric |           |          |
 avg               | numeric |           |          |
 __ivm_count_max__ | bigint  |           |          |
 __ivm_count_min__ | bigint  |           |          |
 __ivm_count_sum__ | bigint  |           |          |
 __ivm_count_avg__ | bigint  |           |          |
 __ivm_sum_avg__   | numeric |           |          |
 __ivm_count__     | bigint  |           |          |

SELECT * FROM group_imv ORDER BY part_key;
 part_key | count |  max  | min  |    sum    |         avg
----------+-------+-------+------+-----------+---------------------
 group-a  |  1966 | 99.85 | 0.05 |  98634.93 | 50.1703611393692777
 group-b  |  2021 | 99.99 | 0.17 | 102614.02 | 50.7738842157347848
 group-c  |  6013 | 99.99 | 0.02 | 300968.43 | 50.0529569266589057
(3 rows)

ALTER MATERIALIZED VIEW group_imv RENAME TO group_imv2;
ALTER MATERIALIZED VIEW
                 List of relations
 Schema |    Name    |       Type        |  Owner
--------+------------+-------------------+----------
 public | group_imv2 | materialized view | postgres
 public | table_x    | table             | postgres
(2 rows)

            Materialized view "public.group_imv2"
      Column       |  Type   | Collation | Nullable | Default
-------------------+---------+-----------+----------+---------
 part_key          | text    |           |          |
 count             | bigint  |           |          |
 max               | numeric |           |          |
 min               | numeric |           |          |
 sum               | numeric |           |          |
 avg               | numeric |           |          |
 __ivm_count_max__ | bigint  |           |          |
 __ivm_count_min__ | bigint  |           |          |
 __ivm_count_sum__ | bigint  |           |          |
 __ivm_count_avg__ | bigint  |           |          |
 __ivm_sum_avg__   | numeric |           |          |
 __ivm_count__     | bigint  |           |          |

SET client_min_messages = debug5;
psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:30: DEBUG:  CommitTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 0/1/0
SET
INSERT INTO table_x VALUES (10000001, ROUND(random()::numeric * 100, 2),
'gruop_d');
psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:33: DEBUG:  StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:33: DEBUG:  relation "public.group_imv" does not exist
psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:33: DEBUG:  relation "public.group_imv" does not exist
psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:33: ERROR:  could not open relation with OID 0
RESET client_min_messages;
psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:34: DEBUG:  StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
RESET
SELECT * FROM group_imv2 ORDER BY part_key;
 part_key | count |  max  | min  |    sum    |         avg
----------+-------+-------+------+-----------+---------------------
 group-a  |  1966 | 99.85 | 0.05 |  98634.93 | 50.1703611393692777
 group-b  |  2021 | 99.99 | 0.17 | 102614.02 | 50.7738842157347848
 group-c  |  6013 | 99.99 | 0.02 | 300968.43 | 50.0529569266589057
(3 rows)

ALTER MATERIALIZED VIEW group_imv2 RENAME TO group_imv;
ALTER MATERIALIZED VIEW
INSERT INTO table_x VALUES (10000001, ROUND(random()::numeric * 100, 2),
'gruop_d');
INSERT 0 1
SELECT * FROM group_imv ORDER BY part_key;
 part_key | count |  max  |  min  |    sum    |         avg
----------+-------+-------+-------+-----------+---------------------
 group-a  |  1966 | 99.85 |  0.05 |  98634.93 | 50.1703611393692777
 group-b  |  2021 | 99.99 |  0.17 | 102614.02 | 50.7738842157347848
 group-c  |  6013 | 99.99 |  0.02 | 300968.43 | 50.0529569266589057
 gruop_d  |     1 | 81.43 | 81.43 |     81.43 | 81.4300000000000000
(4 rows)

[ec2-user@ip-10-0-1-10 ivm]$
```

This may be because IVM internal information is not modified when the view name is renamed.

2018年12月27日(木) 21:57 Yugo Nagata <nagata@sraoss.co.jp>:
Hi,

I would like to implement Incremental View Maintenance (IVM) on PostgreSQL. 
IVM is a technique to maintain materialized views which computes and applies
only the incremental changes to the materialized views rather than
recomputate the contents as the current REFRESH command does.

I had a presentation on our PoC implementation of IVM at PGConf.eu 2018 [1].
Our implementation uses row OIDs to compute deltas for materialized views. 
The basic idea is that if we have information about which rows in base tables
are contributing to generate a certain row in a matview then we can identify
the affected rows when a base table is updated. This is based on an idea of
Dr. Masunaga [2] who is a member of our group and inspired from ID-based
approach[3].

In our implementation, the mapping of the row OIDs of the materialized view
and the base tables are stored in "OID map". When a base relation is modified,
AFTER trigger is executed and the delta is recorded in delta tables using
the transition table feature. The accual udpate of the matview is triggerd
by REFRESH command with INCREMENTALLY option.

However, we realize problems of our implementation. First, WITH OIDS will
be removed since PG12, so OIDs are no longer available. Besides this, it would
be hard to implement this since it needs many changes of executor nodes to
collect base tables's OIDs during execuing a query. Also, the cost of maintaining
OID map would be high.

For these reasons, we started to think to implement IVM without relying on OIDs
and made a bit more surveys. 

We also looked at Kevin Grittner's discussion [4] on incremental matview
maintenance.  In this discussion, Kevin proposed to use counting algorithm [5]
to handle projection views (using DISTNICT) properly. This algorithm need an
additional system column, count_t, in materialized views and delta tables of
base tables.

However, the discussion about IVM is now stoped, so we would like to restart and
progress this.


Through our PoC inplementation and surveys, I think we need to think at least
the followings for implementing IVM.

1. How to extract changes on base tables

I think there would be at least two approaches for it.

 - Using transition table in AFTER triggers
 - Extracting changes from WAL using logical decoding

In our PoC implementation, we used AFTER trigger and transition tables, but using
logical decoding might be better from the point of performance of base table
modification.

If we can represent a change of UPDATE on a base table as query-like rather than
OLD and NEW, it may be possible to update the materialized view directly instead
of performing delete & insert.


2. How to compute the delta to be applied to materialized views

Essentially, IVM is based on relational algebra. Theorically, changes on base
tables are represented as deltas on this, like "R <- R + dR", and the delta on
the materialized view is computed using base table deltas based on "change
propagation equations".  For implementation, we have to derive the equation from
the view definition query (Query tree, or Plan tree?) and describe this as SQL
query to compulte delta to be applied to the materialized view.

There could be several operations for view definition: selection, projection,
join,  aggregation, union, difference, intersection, etc.  If we can prepare a
module for each operation, it makes IVM extensable, so we can start a simple
view definition, and then support more complex views.


3. How to identify rows to be modifed in materialized views

When applying the delta to the materialized view, we have to identify which row
in the matview is corresponding to a row in the delta.  A naive method is matching
by using all columns in a tuple, but clearly this is unefficient. If thematerialized
view has unique index, we can use this. Maybe, we have to force materialized views
to have all primary key colums in their base tables.  In our PoC implementation, we
used OID to identify rows, but this will be no longer available as said above.


4. When to maintain materialized views

There are two candidates of the timing of maintenance, immediate (eager) or deferred.

In eager maintenance, the materialized view is updated in the same transaction
where the base table is updated. In deferred maintenance, this is done after the
transaction is commited, for example, when view is accessed, as a response to user
request, etc.

In the previous discussion[4], it is planned to start from "eager" approach. In our PoC
implementaion, we used the other aproach, that is, using REFRESH command to perform IVM.
I am not sure which is better as a start point, but I begin to think that the eager
approach may be more simple since we don't have to maintain base table changes in other
past transactions.

In the eager maintenance approache, we have to consider a race condition where two
different transactions change base tables simultaneously as discussed in [4].


[1] https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/
[2] https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1 (Japanese only)
[3] https://dl.acm.org/citation.cfm?id=2750546
[4] https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com
[5] https://dl.acm.org/citation.cfm?id=170066

Regards,
--
Yugo Nagata <nagata@sraoss.co.jp>

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: [HACKERS] Block level parallel vacuum
Next
From: nuko yokohama
Date:
Subject: Re: Implementing Incremental View Maintenance