Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY - Mailing list pgsql-bugs

From surya poondla
Subject Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY
Date
Msg-id CAOVWO5o46gqX_=BkrzuTRyPijDBWaYAGFVqn7UXrww_4owhJ0w@mail.gmail.com
Whole thread
In response to Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY  ("cca5507" <cca5507@qq.com>)
List pgsql-bugs
Hi ChangAo,

Thank you for the continued review, your test case helped me find a real issue in v3 patch, and I updated the fix accordingly.

You were correct that v3 was wrong. Removing IS NOT NULL and using *= alone in the pre-check was too aggressive: *= treats NULL=NULL=true, which caused 
(NULL,NULL)×2 with a unique index on a nullable column to be flagged as duplicates even though the unique index allows multiple NULLs. 

The updated patch takes a different approach. Instead of removing the guard and relying on *= alone, the pre-check now uses the same per-column equality
operators as the FULL OUTER JOIN accumulated during the same index-scanning loop. These operators treat NULL=NULL as false, which is consistent with how
unique indexes actually work (NULLs are distinct).

For (test, NULL)×2 with a single index on a (non-null):
newdata2.a = newdata.a i.e 'test'='test' that is TRUE
newdata2.* *= newdata.* that is TRUE
Thus the duplicate is caught and error is raised 

For (NULL, NULL)×2 with a single index on a (nullable):
newdata2.a = newdata.a i.e NULL=NULL that is NULL (false)
Thus no duplicate is caught, the refresh correctly succeeds, MV gets 2 rows 

For (test, NULL)×2 with a composite index on (a, b):
newdata2.a = newdata.a i.e TRUE
newdata2.b = newdata.b i.e NULL=NULL that is NULL (false)
Combined: NULL is not caught, refresh correctly succeeds 

Regarding your record_image_eq_variant approach: it correctly handles the NULL-in-indexed-column case, but it introduces a performance regression for
unchanged rows where any non-indexed column contains NULL.
For example, an unchanged row (1, NULL) with a unique index on non-null index on a would require a DELETE+INSERT on every CONCURRENTLY refresh because
record_image_eq_variant((1,NULL),(1,NULL)) returns false. 
This makes CONCURRENTLY impractical for any table where rows contain NULLs in non-indexed columns.

The updated patch for bug 1.

Here is some additional tests I did:
postgres=# SET client_min_messages = WARNING;
SET
postgres=# -- Test 1: (test,NULL)×2, single index on 'a' should ERROR
postgres=#   CREATE TABLE t(a text, b text);
  INSERT INTO t VALUES('test', NULL);
  CREATE MATERIALIZED VCREATE TABLE
postgres=#   INSERT INTO t VALUES('test', NULL);
INSERT 0 1
postgres=#   CREATE MATERIALIZED VIEW m AS SELECT * FROM t;
  CREATE UNIQUE INDEX ON m(a);
  INSERT INTO t VALUES('test', NULL);
  REFSELECT 1
postgres=#   CREATE UNIQUE INDEX ON m(a);
CREATE INDEX
postgres=#   INSERT INTO t VALUES('test', NULL);
INSERT 0 1
postgres=#   REFRESH MATERIALIZED VIEW CONCURRENTLY m;  -- must error
ERROR:  new data for materialized view "m" contains duplicate rows
DETAIL:  Row: (test,)
postgres=#   DROP TABLE t CASCADE;
DROP TABLE

postgres=# -- Test 2: (NULL,NULL)×2, single index on 'a' should SUCCEED
postgres=#   CREATE TABLE t(a int, b int);
CREATE TABLE
postgres=#   INSERT INTO t VALUES(NULL, NULL);
INSERT 0 1
postgres=#   CREATE MATERIALIZED VIEW m AS SELECT * FROM t;
SELECT 1
postgres=#   CREATE UNIQUE INDEX ON m(a);
CREATE INDEX
postgres=#   INSERT INTO t VALUES(NULL, NULL);
INSERT 0 1
postgres=#   REFRESH MATERIALIZED VIEW CONCURRENTLY m;  -- must succeed
  SELECT COUNT(*) FROM m;
  DROP TABLE t CASCADE;REFRESH MATERIALIZED VIEW
postgres=#   SELECT COUNT(*) FROM m;  -- should be 2
 count
-------
     2
(1 row)
postgres=#   DROP TABLE t CASCADE;
DROP TABLE

postgres=# --Test 3: (test,NULL)×2, composite index (a,b) should SUCCEED
postgres=#   CREATE TABLE t(a text, b text);
CREATE TABLE
postgres=#   INSERT INTO t VALUES('test', NULL);
INSERT 0 1
postgres=#   CREATE MATERIALIZED VIEW m AS SELECT * FROM t;
SELECT 1
postgres=#   CREATE UNIQUE INDEX ON m(a, b);
  INSERT INTO t VALUES('test', NUL  CREATE UNIQUE INDEX ON m(a, b);
CREATE INDEX
postgres=#   INSERT INTO t VALUES('test', NULL);
INSERT 0 1
postgres=#   REFRESH MATERIALIZED VIEW CONCURRENTLY m;  -- must succeed
REFRESH MATERIALIZED VIEW
postgres=#   SELECT COUNT(*) FROM m; 
  DROP TABLE t CASCADE;
  SELECT COUNT(*) FROM m;  -- must be 2
 count
-------
     2
(1 row)
postgres=#   DROP TABLE t CASCADE;
DROP TABLE

postgres=# -- Test 4: unchanged (1,NULL), index on 'a' should SUCCEED.
postgres=#   CREATE TABLE t(a int, b int);
CREATE TABLE
postgres=#   INSERT INTO t VALUES(1, NULL);
INSERT 0 1
postgres=#   CREATE MATERIALIZED VIEW m AS SELECT * FROM t;
SELECT 1
postgres=#   CREATE UNIQUE INDEX ON m(a);
CREATE INDEX
postgres=#   REFRESH MATERIALIZED VIEW CONCURRENTLY m;  -- must succeed
REFRESH MATERIALIZED VIEW
postgres=#   SELECT * FROM m;  -- must still show (1,)
 a | b
---+---
 1 |
(1 row)
postgres=#   DROP TABLE t CASCADE;
DROP TABLE

postgres=# -- Test 5: (1,NULL)×2, separate index on a AND b, should ERROR
postgres=#   CREATE TABLE t(a int, b int);
CREATE TABLE
postgres=#   INSERT INTO t VALUES(1, NULL);
INSERT 0 1
postgres=#   CREATE MATERIALIZED VIEW m AS SELECT * FROM t;
SELECT 1
postgres=#   CREATE UNIQUE INDEX ON m(a);
CREATE INDEX
postgres=#   CREATE UNIQUE INDEX ON m(b);
CREATE INDEX
postgres=#   INSERT INTO t VALUES(1, NULL);
INSERT 0 1
postgres=#   REFRESH MATERIALIZED VIEW CONCURRENTLY m;  -- must error
ERROR:  duplicate key value violates unique constraint "m_a_idx"
DETAIL:  Key (a)=(1) already exists.
CONTEXT:  SQL statement "INSERT INTO public.m SELECT (diff.newdata).* FROM pg_temp_2.pg_temp_16535_2 diff WHERE tid IS NULL"
postgres=#   DROP TABLE t CASCADE;
DROP TABLE
postgres=#

Made some minor changes to bug2 patch too.

Regards,
Surya Poondla
Attachment

pgsql-bugs by date:

Previous
From: Dmitriy Kuzmin
Date:
Subject: Re: BUG #19437: temp_tablespaces doesn't work inside a cursor?
Next
From: Andrei Lepikhov
Date:
Subject: Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables