Re: [Patch] Add WHERE clause support to REFRESH MATERIALIZED VIEW - Mailing list pgsql-hackers

From Adam Brusselback
Subject Re: [Patch] Add WHERE clause support to REFRESH MATERIALIZED VIEW
Date
Msg-id CAMjNa7esgLwWU8fmPU7V_AM6iLg1dRmpiAufL-Q9jdWpopfTNg@mail.gmail.com
Whole thread Raw
In response to Re: [Patch] Add WHERE clause support to REFRESH MATERIALIZED VIEW  (Dharin Shah <dharinshah95@gmail.com>)
Responses Re: [Patch] Add WHERE clause support to REFRESH MATERIALIZED VIEW
List pgsql-hackers
Hi Dharin, thanks for the review.

> 1. indnatts vs indnkeyatts

Good catch. Will fix.

> 2. Subqueries -> Error

That comment is wrong, I never added a check for that because it turned out to be unnecessary. Will remove.

> 3. Concurrency gap / safety model

To answer your questions directly:

1. The goal is to be safe against concurrent partial refreshes on overlapping rows, not just concurrent DML on base tables.
2. The intent is maintenance-like and safe by default.

Because we lose the physical lock on the row after the DELETE, I plan to enforce that safety default via transaction-level advisory locks acquired before the DELETE with somethin like:

    SELECT pg_advisory_xact_lock(matviewOid, hashtext(ROW(key_cols)::text))
    FROM matview 
    WHERE (condition);

Concurrent refreshes on the same logical rows will serialize while non-overlapping rows still run in parallel.

This also made me think about whether the CONCURRENTLY keyword is doing the right thing here. Here's how the guarantees break down across all the refresh modes:

Refresh Command / State                 | Base Table Lock  | Concurrent Reads? | Concurrent Writes? | Same-Row Concurrent Refreshes
----------------------------------------+------------------+-------------------+--------------------+------------------------------
Standard Full Refresh                   | ACCESS EXCLUSIVE | Blocked           | Blocked            | Blocked (Table Level)
CONCURRENTLY (Full)                     | EXCLUSIVE        | Allowed           | Blocked            | Blocked (Table Level)
Partial (WHERE) - Current Patch         | ROW EXCLUSIVE    | Allowed           | Allowed            | Race condition (Fails)
Partial (WHERE) - With Advisory Locks   | ROW EXCLUSIVE    | Allowed           | Allowed            | Serialized (Waits)
Partial (CONCURRENTLY WHERE)            | EXCLUSIVE        | Allowed           | Blocked            | Serialized (Waits)

Because of this, the `CONCURRENTLY` distinction gets inverted with a `WHERE` clause. With a full refresh, `CONCURRENTLY` is the more permissive option (allowing readers). But here, the bare `WHERE` path allows both reads and writes, while `CONCURRENTLY WHERE` blocks writers. Non-concurrent ends up being the more permissive option, which goes against what the keyword generally implies.

One option is to swap the two implementations to restore that intuition. `CONCURRENTLY WHERE` becomes the advisory locks approach (maximum throughput), and bare `WHERE` becomes the diff approach (conservative, blocks writers). On the other hand, `CONCURRENTLY` has historically meant the diff-based algorithm specifically, not just a lower lock level.

I don't have a strong opinion here and would rather let the community decide. The updated patch will leave the algorithms as-is for now. Happy to swap them if that's the preferred direction.

Will post an updated patch soon.

Thanks,
Adam Brusselback

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Trying out
Next
From: Bharath Rupireddy
Date:
Subject: Re: Add logical_decoding_spill_limit to cap spill file disk usage per slot