Re: Implementing Incremental View Maintenance - Mailing list pgsql-hackers
From | Yugo NAGATA |
---|---|
Subject | Re: Implementing Incremental View Maintenance |
Date | |
Msg-id | 20200413141835.04fcdf6424985369554b6a4d@sraoss.co.jp Whole thread Raw |
In response to | Re: Implementing Incremental View Maintenance (Yugo NAGATA <nagata@sraoss.co.jp>) |
Responses |
Re: Implementing Incremental View Maintenance
Re: Implementing Incremental View Maintenance |
List | pgsql-hackers |
On Fri, 10 Apr 2020 23:26:58 +0900 Yugo NAGATA <nagata@sraoss.co.jp> wrote: > Hi, > > Attached is the latest patch (v15) to add support for Incremental Materialized > View Maintenance (IVM). It is possible to apply to current latest master branch. I found a mistake of splitting patch, so I attached the fixed patch (v15a). > Differences from the previous patch (v14) include: > > * Fix to not use generate_series when views are queried > > In the previous implementation, multiplicity of each tuple was stored > in ivm_count column in views. When SELECT was issued for views with > duplicate, the view was replaced with a subquery in which each tuple > was joined with generate_series function in order to output tuples > of the number of ivm_count. > > This was problematic for following reasons: > > - The overhead was huge. When almost of tuples in a view were selected, > it took much longer time than the original query. This lost the meaning > of materialized views. > > - Optimizer could not estimate row numbers correctly because this had to > know ivm_count values stored in tuples. > > - System columns of materialized views like cmin, xmin, xmax could not > be used because a view was replaced with a subquery. > > To resolve this, the new implementation doen't store multiplicities > for views with tuple duplicates, and doesn't use generate_series > when SELECT query is issued for such views. > > Note that we still have to use ivm_count for supporting DISTINCT and > aggregates. I also explain the way of updating views with tuple duplicates. Although a view itself doesn't have ivm_count column, multiplicities for old delta and new delta are calculated and the count value is contained in a column named __ivm_count__ in each delta table. The old delta table is applied using ctid and row_number function. row_number is used to numbering tuples in the view, and tuples whose number is equal or is less than __ivm_count__ are deleted from the view using a query like: DELETE FROM matviewname WHERE ctid IN ( SELECT tid FROM ( SELECT row_number() over (partition by c1, c2, ...) AS __ivm_row_number__, mv.ctid AS tid, diff.__ivm_count__ FROM matviewname AS mv, old_delta AS diff " WHERE mv.c1 = diff.c1 AND mv.c2 = diff.c2 AND ... ) v WHERE v.__ivm_row_number__ <= v.__ivm_count__ The new delta is applied using generate_seriese to insert mutiple same tuples, using a query like: INSERT INTO matviewname (c1, c2, ...) SELECT c1,c2,... FROM ( SELECT diff.*, generate_series( > > * Add query checks for IVM restrictions > > Query checks for following restrictions are added: > > - DISTINCT ON > - TABLESAMPLE parameter > - inheritance parent table > - window function > - some aggregate options(such as FILTER, DISTINCT, ORDER and GROUPING SETS) > - targetlist containing IVM column > - simple subquery is only supported > - FOR UPDATE/SHARE > - empty target list > - UNION/INTERSECT/EXCEPT > - GROUPING SETS clauses > > * Improve error messages > > Add error code ERRCODE_FEATURE_NOT_SUPPORTED to each IVM error message. > Also, the message format was unified. > > * Support subqueries containig joins in FROM clause > > Previously, when multi tables are updated simultaneously, incremental > view maintenance with subqueries including JOIN didn't work correctly > due to a bug. > > Best Regards, > Takuma Hoshiai > > -- > Yugo NAGATA <nagata@sraoss.co.jp> -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
pgsql-hackers by date: