Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY - Mailing list pgsql-bugs
| From | surya poondla |
|---|---|
| Subject | Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY |
| Date | |
| Msg-id | CAOVWO5o46gqX_=BkrzuTRyPijDBWaYAGFVqn7UXrww_4owhJ0w@mail.gmail.com Whole thread |
| In response to | Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY ("cca5507" <cca5507@qq.com>) |
| List | pgsql-bugs |
Hi ChangAo,
Thank you for the continued review, your test case helped me find a real issue in v3 patch, and I updated the fix accordingly.
You were correct that v3 was wrong. Removing IS NOT NULL and using *= alone in the pre-check was too aggressive: *= treats NULL=NULL=true, which caused
Thank you for the continued review, your test case helped me find a real issue in v3 patch, and I updated the fix accordingly.
You were correct that v3 was wrong. Removing IS NOT NULL and using *= alone in the pre-check was too aggressive: *= treats NULL=NULL=true, which caused
(NULL,NULL)×2 with a unique index on a nullable column to be flagged as duplicates even though the unique index allows multiple NULLs.
The updated patch takes a different approach. Instead of removing the guard and relying on *= alone, the pre-check now uses the same per-column equality
operators as the FULL OUTER JOIN accumulated during the same index-scanning loop. These operators treat NULL=NULL as false, which is consistent with how
unique indexes actually work (NULLs are distinct).
For (test, NULL)×2 with a single index on a (non-null):
newdata2.a = newdata.a i.e 'test'='test' that is TRUE
newdata2.* *= newdata.* that is TRUE
Thus the duplicate is caught and error is raised
The updated patch takes a different approach. Instead of removing the guard and relying on *= alone, the pre-check now uses the same per-column equality
operators as the FULL OUTER JOIN accumulated during the same index-scanning loop. These operators treat NULL=NULL as false, which is consistent with how
unique indexes actually work (NULLs are distinct).
For (test, NULL)×2 with a single index on a (non-null):
newdata2.a = newdata.a i.e 'test'='test' that is TRUE
newdata2.* *= newdata.* that is TRUE
Thus the duplicate is caught and error is raised
For (NULL, NULL)×2 with a single index on a (nullable):
newdata2.a = newdata.a i.e NULL=NULL that is NULL (false)
Thus no duplicate is caught, the refresh correctly succeeds, MV gets 2 rows
For (test, NULL)×2 with a composite index on (a, b):
newdata2.a = newdata.a i.e TRUE
newdata2.b = newdata.b i.e NULL=NULL that is NULL (false)
Combined: NULL is not caught, refresh correctly succeeds
Regarding your record_image_eq_variant approach: it correctly handles the NULL-in-indexed-column case, but it introduces a performance regression for
unchanged rows where any non-indexed column contains NULL.
For example, an unchanged row (1, NULL) with a unique index on non-null index on a would require a DELETE+INSERT on every CONCURRENTLY refresh because
record_image_eq_variant((1,NULL),(1,NULL)) returns false.
This makes CONCURRENTLY impractical for any table where rows contain NULLs in non-indexed columns.
The updated patch for bug 1.
Here is some additional tests I did:
postgres=# SET client_min_messages = WARNING;
SET
postgres=# -- Test 1: (test,NULL)×2, single index on 'a' should ERROR
postgres=# CREATE TABLE t(a text, b text);
INSERT INTO t VALUES('test', NULL);
CREATE MATERIALIZED VCREATE TABLE
postgres=# INSERT INTO t VALUES('test', NULL);
INSERT 0 1
postgres=# CREATE MATERIALIZED VIEW m AS SELECT * FROM t;
CREATE UNIQUE INDEX ON m(a);
INSERT INTO t VALUES('test', NULL);
REFSELECT 1
postgres=# CREATE UNIQUE INDEX ON m(a);
CREATE INDEX
postgres=# INSERT INTO t VALUES('test', NULL);
INSERT 0 1
postgres=# REFRESH MATERIALIZED VIEW CONCURRENTLY m; -- must error
ERROR: new data for materialized view "m" contains duplicate rows
DETAIL: Row: (test,)
postgres=# DROP TABLE t CASCADE;
DROP TABLE
postgres=# -- Test 2: (NULL,NULL)×2, single index on 'a' should SUCCEED
postgres=# CREATE TABLE t(a int, b int);
CREATE TABLE
postgres=# INSERT INTO t VALUES(NULL, NULL);
INSERT 0 1
postgres=# CREATE MATERIALIZED VIEW m AS SELECT * FROM t;
SELECT 1
postgres=# CREATE UNIQUE INDEX ON m(a);
CREATE INDEX
postgres=# INSERT INTO t VALUES(NULL, NULL);
INSERT 0 1
postgres=# REFRESH MATERIALIZED VIEW CONCURRENTLY m; -- must succeed
SELECT COUNT(*) FROM m;
DROP TABLE t CASCADE;REFRESH MATERIALIZED VIEW
postgres=# SELECT COUNT(*) FROM m; -- should be 2
count
-------
2
(1 row)
postgres=# DROP TABLE t CASCADE;
DROP TABLE
postgres=# --Test 3: (test,NULL)×2, composite index (a,b) should SUCCEED
postgres=# CREATE TABLE t(a text, b text);
CREATE TABLE
postgres=# INSERT INTO t VALUES('test', NULL);
INSERT 0 1
postgres=# CREATE MATERIALIZED VIEW m AS SELECT * FROM t;
SELECT 1
postgres=# CREATE UNIQUE INDEX ON m(a, b);
INSERT INTO t VALUES('test', NUL CREATE UNIQUE INDEX ON m(a, b);
CREATE INDEX
postgres=# INSERT INTO t VALUES('test', NULL);
INSERT 0 1
postgres=# REFRESH MATERIALIZED VIEW CONCURRENTLY m; -- must succeed
REFRESH MATERIALIZED VIEW
postgres=# SELECT COUNT(*) FROM m;
DROP TABLE t CASCADE;
SELECT COUNT(*) FROM m; -- must be 2
count
-------
2
(1 row)
postgres=# DROP TABLE t CASCADE;
DROP TABLE
postgres=# -- Test 4: unchanged (1,NULL), index on 'a' should SUCCEED.
postgres=# CREATE TABLE t(a int, b int);
CREATE TABLE
postgres=# INSERT INTO t VALUES(1, NULL);
INSERT 0 1
postgres=# CREATE MATERIALIZED VIEW m AS SELECT * FROM t;
SELECT 1
postgres=# CREATE UNIQUE INDEX ON m(a);
CREATE INDEX
postgres=# REFRESH MATERIALIZED VIEW CONCURRENTLY m; -- must succeed
REFRESH MATERIALIZED VIEW
postgres=# SELECT * FROM m; -- must still show (1,)
a | b
---+---
1 |
(1 row)
postgres=# DROP TABLE t CASCADE;
DROP TABLE
postgres=# -- Test 5: (1,NULL)×2, separate index on a AND b, should ERROR
postgres=# CREATE TABLE t(a int, b int);
CREATE TABLE
postgres=# INSERT INTO t VALUES(1, NULL);
INSERT 0 1
postgres=# CREATE MATERIALIZED VIEW m AS SELECT * FROM t;
SELECT 1
postgres=# CREATE UNIQUE INDEX ON m(a);
CREATE INDEX
postgres=# CREATE UNIQUE INDEX ON m(b);
CREATE INDEX
postgres=# INSERT INTO t VALUES(1, NULL);
INSERT 0 1
postgres=# REFRESH MATERIALIZED VIEW CONCURRENTLY m; -- must error
ERROR: duplicate key value violates unique constraint "m_a_idx"
DETAIL: Key (a)=(1) already exists.
CONTEXT: SQL statement "INSERT INTO public.m SELECT (diff.newdata).* FROM pg_temp_2.pg_temp_16535_2 diff WHERE tid IS NULL"
postgres=# DROP TABLE t CASCADE;
DROP TABLE
postgres=#
Made some minor changes to bug2 patch too.
Regards,
Surya Poondla
The updated patch for bug 1.
Here is some additional tests I did:
postgres=# SET client_min_messages = WARNING;
SET
postgres=# -- Test 1: (test,NULL)×2, single index on 'a' should ERROR
postgres=# CREATE TABLE t(a text, b text);
INSERT INTO t VALUES('test', NULL);
CREATE MATERIALIZED VCREATE TABLE
postgres=# INSERT INTO t VALUES('test', NULL);
INSERT 0 1
postgres=# CREATE MATERIALIZED VIEW m AS SELECT * FROM t;
CREATE UNIQUE INDEX ON m(a);
INSERT INTO t VALUES('test', NULL);
REFSELECT 1
postgres=# CREATE UNIQUE INDEX ON m(a);
CREATE INDEX
postgres=# INSERT INTO t VALUES('test', NULL);
INSERT 0 1
postgres=# REFRESH MATERIALIZED VIEW CONCURRENTLY m; -- must error
ERROR: new data for materialized view "m" contains duplicate rows
DETAIL: Row: (test,)
postgres=# DROP TABLE t CASCADE;
DROP TABLE
postgres=# -- Test 2: (NULL,NULL)×2, single index on 'a' should SUCCEED
postgres=# CREATE TABLE t(a int, b int);
CREATE TABLE
postgres=# INSERT INTO t VALUES(NULL, NULL);
INSERT 0 1
postgres=# CREATE MATERIALIZED VIEW m AS SELECT * FROM t;
SELECT 1
postgres=# CREATE UNIQUE INDEX ON m(a);
CREATE INDEX
postgres=# INSERT INTO t VALUES(NULL, NULL);
INSERT 0 1
postgres=# REFRESH MATERIALIZED VIEW CONCURRENTLY m; -- must succeed
SELECT COUNT(*) FROM m;
DROP TABLE t CASCADE;REFRESH MATERIALIZED VIEW
postgres=# SELECT COUNT(*) FROM m; -- should be 2
count
-------
2
(1 row)
postgres=# DROP TABLE t CASCADE;
DROP TABLE
postgres=# --Test 3: (test,NULL)×2, composite index (a,b) should SUCCEED
postgres=# CREATE TABLE t(a text, b text);
CREATE TABLE
postgres=# INSERT INTO t VALUES('test', NULL);
INSERT 0 1
postgres=# CREATE MATERIALIZED VIEW m AS SELECT * FROM t;
SELECT 1
postgres=# CREATE UNIQUE INDEX ON m(a, b);
INSERT INTO t VALUES('test', NUL CREATE UNIQUE INDEX ON m(a, b);
CREATE INDEX
postgres=# INSERT INTO t VALUES('test', NULL);
INSERT 0 1
postgres=# REFRESH MATERIALIZED VIEW CONCURRENTLY m; -- must succeed
REFRESH MATERIALIZED VIEW
postgres=# SELECT COUNT(*) FROM m;
DROP TABLE t CASCADE;
SELECT COUNT(*) FROM m; -- must be 2
count
-------
2
(1 row)
postgres=# DROP TABLE t CASCADE;
DROP TABLE
postgres=# -- Test 4: unchanged (1,NULL), index on 'a' should SUCCEED.
postgres=# CREATE TABLE t(a int, b int);
CREATE TABLE
postgres=# INSERT INTO t VALUES(1, NULL);
INSERT 0 1
postgres=# CREATE MATERIALIZED VIEW m AS SELECT * FROM t;
SELECT 1
postgres=# CREATE UNIQUE INDEX ON m(a);
CREATE INDEX
postgres=# REFRESH MATERIALIZED VIEW CONCURRENTLY m; -- must succeed
REFRESH MATERIALIZED VIEW
postgres=# SELECT * FROM m; -- must still show (1,)
a | b
---+---
1 |
(1 row)
postgres=# DROP TABLE t CASCADE;
DROP TABLE
postgres=# -- Test 5: (1,NULL)×2, separate index on a AND b, should ERROR
postgres=# CREATE TABLE t(a int, b int);
CREATE TABLE
postgres=# INSERT INTO t VALUES(1, NULL);
INSERT 0 1
postgres=# CREATE MATERIALIZED VIEW m AS SELECT * FROM t;
SELECT 1
postgres=# CREATE UNIQUE INDEX ON m(a);
CREATE INDEX
postgres=# CREATE UNIQUE INDEX ON m(b);
CREATE INDEX
postgres=# INSERT INTO t VALUES(1, NULL);
INSERT 0 1
postgres=# REFRESH MATERIALIZED VIEW CONCURRENTLY m; -- must error
ERROR: duplicate key value violates unique constraint "m_a_idx"
DETAIL: Key (a)=(1) already exists.
CONTEXT: SQL statement "INSERT INTO public.m SELECT (diff.newdata).* FROM pg_temp_2.pg_temp_16535_2 diff WHERE tid IS NULL"
postgres=# DROP TABLE t CASCADE;
DROP TABLE
postgres=#
Made some minor changes to bug2 patch too.
Regards,
Surya Poondla
Attachment
pgsql-bugs by date: