I noticed the following two issues while looking at the code that handles
REFRESH MATERIALIZED VIEW CONCURRENTLY (refresh_by_match_merge() in matview.c):
1.
At the beginning of the function, there is some code that checks for duplicate
rows, but it does not catch the following case:
CREATE TABLE t(a text, b text);
INSERT INTO t VALUES('test', null);
CREATE MATERIALIZED VIEW m AS SELECT * FROM t;
CREATE UNIQUE INDEX ON m(a);
INSERT INTO t VALUES('test', null); -- t now contains two identical rows
REFRESH MATERIALIZED VIEW CONCURRENTLY m;
-> no error, but m still contains only one row!
REFRESH MATERIALIZED VIEW m;
-> error (as expected)
2.
Do I understand correctly that the join creating the "diff" table is given
equality conditions for all columns referenced in any unique indexes? This
led me to think that a unique index on a column with many null entries
would enlarge the "diff" table.
In the following example, creating the second unique index noticeably worsens
the performance of REFRESH MATERIALIZED VIEW CONCURRENTLY:
CREATE MATERIALIZED VIEW s AS SELECT generate_series as x, null as y FROM generate_series(1, 1000000);
CREATE UNIQUE INDEX ON s(x);
REFRESH MATERIALIZED VIEW CONCURRENTLY s;
-> runs for ~1700 ms
CREATE UNIQUE INDEX ON s(y);
REFRESH MATERIALIZED VIEW CONCURRENTLY s;
-> runs for ~9000 ms
Kind regards,
Giuliano