Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY - Mailing list pgsql-bugs

From surya poondla
Subject Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY
Date
Msg-id CAOVWO5pQ4jyR1M6XYkjkg7c1KshrzaG-9mng7AGBGghZpYKR-Q@mail.gmail.com
Whole thread
In response to Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY  ("cca5507" <cca5507@qq.com>)
Responses Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY
List pgsql-bugs
Hi ChangAo,

Thank you for the detailed review.

For issue 1, my fix removes the IS NOT NULL guard from the pre-check so
that *= can detect all duplicate rows, including those containing NULLs.
(Note: The semantics of *=  has always treated NULL as equal to NULL.)

The reasoning is straightforward: the JOIN uses *= to match newdata rows
against MV rows. If newdata contains two *=-equal rows, both would match
the same MV row in the JOIN, producing a wrong diff. The pre-check must
therefore use the same *= semantics to catch exactly those duplicates 
which is what my fix does by removing the IS NOT NULL guard.
The IS NOT NULL guard was the bug as it was hiding real duplicates from detection.

Your approach leaves the pre-check unchanged and instead replaces *= in
the JOIN with record_image_eq_variant (NULL != NULL). I see two concerns:
1. record_image_eq_variant applies NULL != NULL globally to all rows in
   the JOIN, not just duplicate ones. This means any unchanged row
   containing any NULL in any column will never match its counterpart
   during the JOIN, causing a DELETE + INSERT for that row on every
   refresh even when the data has not changed. The original issue 2 was
   specifically about nullable indexed columns, your fix extends the
   performance problem to all nullable columns anywhere in the row,
   which makes the performance worse than issue 2.
2. The error surfaced becomes a unique_violation from the index rather
   than the explicit "contains duplicate rows" message, which is harder
   for users to diagnose.

Regards,
Surya Poondla

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables
Next
From: surya poondla
Date:
Subject: Re: BUG #19382: Server crash at __nss_database_lookup