Re: Incremental View Maintenance, take 2 - Mailing list pgsql-hackers
From | jian he |
---|---|
Subject | Re: Incremental View Maintenance, take 2 |
Date | |
Msg-id | CACJufxEoCCJE1vntJp1SWjen8vBUa3vZLgL=swPwar4zim976g@mail.gmail.com Whole thread Raw |
In response to | Re: Incremental View Maintenance, take 2 (Yugo NAGATA <nagata@sraoss.co.jp>) |
Responses |
Re: Incremental View Maintenance, take 2
Re: Incremental View Maintenance, take 2 |
List | pgsql-hackers |
hi based on v29. based on https://stackoverflow.com/a/4014981/1560347: I added a new function append_update_set_caluse, and deleted functions: {append_set_clause_for_count, append_set_clause_for_sum, append_set_clause_for_avg, append_set_clause_for_minmax} I guess this way is more extensible/generic than yours. replaced the following code with the generic function: append_update_set_caluse. + /* For views with aggregates, we need to build SET clause for updating aggregate + * values. */ + if (query->hasAggs && IsA(tle->expr, Aggref)) + { + Aggref *aggref = (Aggref *) tle->expr; + const char *aggname = get_func_name(aggref->aggfnoid); + + /* + * We can use function names here because it is already checked if these + * can be used in IMMV by its OID at the definition time. + */ + + /* count */ + if (!strcmp(aggname, "count")) + append_set_clause_for_count(resname, aggs_set_old, aggs_set_new, aggs_list_buf); + + /* sum */ + else if (!strcmp(aggname, "sum")) + append_set_clause_for_sum(resname, aggs_set_old, aggs_set_new, aggs_list_buf); + + /* avg */ + else if (!strcmp(aggname, "avg")) + append_set_clause_for_avg(resname, aggs_set_old, aggs_set_new, aggs_list_buf, + format_type_be(aggref->aggtype)); + + else + elog(ERROR, "unsupported aggregate function: %s", aggname); + } ----------------------<<< attached is my refactor. there is some whitespace errors in the patches, you need use git apply --reject --whitespace=fix basedon_v29_matview_c_refactor_update_set_clause.patch Also you patch cannot use git apply, i finally found out bulk apply using gnu patch from https://serverfault.com/questions/102324/apply-multiple-patch-files. previously I just did it manually one by one. I think if you use { for i in $PATCHES/v29*.patch; do patch -p1 < $i; done } GNU patch, it will generate an .orig file for every modified file? -----------------<<<<< src/backend/commands/matview.c 2268: /* For tuple deletion */ maybe "/* For tuple deletion and update*/" is more accurate? -----------------<<<<< currently at here: src/test/regress/sql/incremental_matview.sql 98: -- support SUM(), COUNT() and AVG() aggregate functions 99: BEGIN; 100: CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_agg AS SELECT i, SUM(j), COUNT(i), AVG(j) FROM mv_base_a GROUP BY i; 101: SELECT * FROM mv_ivm_agg ORDER BY 1,2,3,4; 102: INSERT INTO mv_base_a VALUES(2,100); src/backend/commands/matview.c 2858: if (SPI_exec(querybuf.data, 0) != SPI_OK_INSERT) 2859: elog(ERROR, "SPI_exec failed: %s", querybuf.data); then I debug, print out querybuf.data: WITH updt AS (UPDATE public.mv_ivm_agg AS mv SET __ivm_count__ = mv.__ivm_count__ OPERATOR(pg_catalog.+) diff.__ivm_count__ , sum = (CASE WHEN mv.__ivm_count_sum__ OPERATOR(pg_catalog.=) 0 AND diff.__ivm_count_sum__ OPERATOR(pg_catalog.=) 0 THEN NULL WHEN mv.sum IS NULL THEN diff.sum WHEN diff.sum IS NULL THEN mv.sum ELSE (mv.sum OPERATOR(pg_catalog.+) diff.sum) END), __ivm_count_sum__ = (mv.__ivm_count_sum__ OPERATOR(pg_catalog.+) diff.__ivm_count_sum__), count = (mv.count OPERATOR(pg_catalog.+) diff.count), avg = (CASE WHEN mv.__ivm_count_avg__ OPERATOR(pg_catalog.=) 0 AND diff.__ivm_count_avg__ OPERATOR(pg_catalog.=) 0 THEN NULL WHEN mv.__ivm_sum_avg__ IS NULL THEN diff.__ivm_sum_avg__ WHEN diff.__ivm_sum_avg__ IS NULL THEN mv.__ivm_sum_avg__ ELSE (mv.__ivm_sum_avg__ OPERATOR(pg_catalog.+) diff.__ivm_sum_avg__)::numeric END) OPERATOR(pg_catalog./) (mv.__ivm_count_avg__ OPERATOR(pg_catalog.+) diff.__ivm_count_avg__), __ivm_sum_avg__ = (CASE WHEN mv.__ivm_count_avg__ OPERATOR(pg_catalog.=) 0 AND diff.__ivm_count_avg__ OPERATOR(pg_catalog.=) 0 THEN NULL WHEN mv.__ivm_sum_avg__ IS NULL THEN diff.__ivm_sum_avg__ WHEN diff.__ivm_sum_avg__ IS NULL THEN mv.__ivm_sum_avg__ ELSE (mv.__ivm_sum_avg__ OPERATOR(pg_catalog.+) diff.__ivm_sum_avg__) END), __ivm_count_avg__ = (mv.__ivm_count_avg__ OPERATOR(pg_catalog.+) diff.__ivm_count_avg__) FROM new_delta AS diff WHERE (mv.i OPERATOR(pg_catalog.=) diff.i OR (mv.i IS NULL AND diff.i IS NULL)) RETURNING mv.i) INSERT INTO public.mv_ivm_agg (i, sum, count, avg, __ivm_count_sum__, __ivm_count_avg__, __ivm_sum_avg__, __ivm_count__) SELECT i, sum, count, avg, __ivm_count_sum__, __ivm_count_avg__, __ivm_sum_avg__, __ivm_count__ FROM new_delta AS diff WHERE NOT EXISTS (SELECT 1 FROM updt AS mv WHERE (mv.i OPERATOR(pg_catalog.=) diff.i OR (mv.i IS NULL AND diff.i IS NULL))); At this final SPI_exec, we have a update statement with related columns { __ivm_count_sum__, sum, __ivm_count__, count, avg, __ivm_sum_avg__, __ivm_count_avg__}. At this time, my mind stops working, querybuf.data is way too big, but I still feel like there is some logic associated with these columns, maybe we can use it as an assertion to prove that this query (querybuf.len = 1834) is indeed correct. Since the apply delta query is quite complex, I feel like adding some "if debug then print out the final querybuf.data end if" would be a good idea. we add hidden columns somewhere, also to avoid corner cases, so maybe somewhere we should assert total attribute number is sane.
Attachment
pgsql-hackers by date: