Re: [Patch] Add WHERE clause support to REFRESH MATERIALIZED VIEW - Mailing list pgsql-hackers

From Adam Brusselback
Subject Re: [Patch] Add WHERE clause support to REFRESH MATERIALIZED VIEW
Date
Msg-id CAMjNa7fJUwcOxf+qV8g+tCQ-3E-YAiKgE_Qs6u-xjdxe12T0SQ@mail.gmail.com
Whole thread Raw
In response to Re: [Patch] Add WHERE clause support to REFRESH MATERIALIZED VIEW  (Kirk Wolak <wolakk@gmail.com>)
List pgsql-hackers
>   Our situation was a wonderful MV with all the columns we needed (some
> hard to calculate) to augment search data done millions of times/day.  It
> was a thing of beauty.  Until we realized we needed to update 1 record
> (vendor inventory UPDATE date/time) every time we processed a file
> (something we do 24x7, a hundred times each hour!
>
>   For that ONE field, we ended up doing REFRESH MV concurrently;  OVER
> 2,000 times per day.

Thanks for the feedback and the use case, Kirk.

Regarding that specific scenario where a single column ("last updated" or similar) churns significantly faster than the heavy-computation columns:
Even with this patch, you might find it beneficial to separate that high-velocity column into its own small materialized view (or regular view) and join it to the main MV at query time. That will reduce the bloat you get on the main MV by quite a lot, especially if you have very wide rows (which it seems like you do).

I initially tried to implement logic that would allow for direct UPDATEs (which would enable HOT updates). However, to handle rows that matched the predicate but were no longer present in the new source data, I had to run an anti-join to identify them for deletion. That approach caused performance issues, so I settled on the "Prune + Upsert" strategy (DELETE matching rows, then INSERT from source).

Because this patch performs a delete/insert cycle, updating that one timestamp column will still result in rewriting the whole tuple in the MV.

>   For that ONE field, we ended up doing REFRESH MV concurrently;  OVER
> 2,000 times per day.

That said, 2,000 refreshes per day is nothing for this implementation, provided your updates are selective enough and your queries allow for predicate push-down to the base tables.

I look forward to your thoughts after reviewing the code.

Thanks,
Adam Brusselback

pgsql-hackers by date:

Previous
From: Srinath Reddy Sadipiralla
Date:
Subject: Re: pg_dump:qemu: uncaught target signal 7 (Bus error) - core dumped
Next
From: Alvaro Herrera
Date:
Subject: Re: log_min_messages per backend type