Thread: Delta Materialized View Refreshes?
To me the biggest disadvantage of using a materialized view instead of a table delta load for an ETL process, for example, is that the materialized view is very inefficient in terms of write overhead, for of course it rewrites the whole set of data in the table, meaning heap, indexes, WAL and everything that goes with it.
Couldn't this be coded to only update changed records? As an example, we have a materialized view definition that holds summarized data on customer records, and it only changes when new customers are added. It would be so nice to only write that change.
I recognize that it may take longer, but for our environment as I'm sure is true of many others, we are often much more excited about reducing write churn than read churn.
If in fact it could be option, I am curious if we could leverage the work being done on the merge functionality to do a "merge" from the materialized view definition into the actual persistent object.
If people really like the current functionality, then at least it could be an option like REFRESH MATERIALIZED VIEW DELTA or something - I dunno I am more interested in just having the feature!
Many thanks for any thoughts on this subject.
Jeremy
Hi Jeremy, Could you apply something similar using triggers? One question would be how PG would identify changes to existing rows - using the replication facilities to essentially replicateinto the view? This would be quite tricky I reckon. Otherwise a change to the underlying table may not propagatecorrectly to the MV. -- Jordan Deitch https://id.rsa.pub/
Could you apply something similar using triggers?
One question would be how PG would identify changes to existing rows - using the replication facilities to essentially replicate into the view? This would be quite tricky I reckon. Otherwise a change to the underlying table may not propagate correctly to the MV.
That's not what I had in mind. I only mean when REFRESH MATERIALIZED VIEW is run, it gathers the results of the view in memory, then instead of essentially "wiping and reloading" the table, it would only write the differences. So if 90% of the rows would be the same as before the refresh, we only update 10% of the rows.
This would also mean materialized views could get bloated just like tables.
Thanks,
Jeremy
Jeremy
On Fri, 14 Sep 2018 at 16:26, Jeremy Finzel <finzelj@gmail.com> wrote:
Could you apply something similar using triggers?
One question would be how PG would identify changes to existing rows - using the replication facilities to essentially replicate into the view? This would be quite tricky I reckon. Otherwise a change to the underlying table may not propagate correctly to the MV.That's not what I had in mind. I only mean when REFRESH MATERIALIZED VIEW is run, it gathers the results of the view in memory, then instead of essentially "wiping and reloading" the table, it would only write the differences. So if 90% of the rows would be the same as before the refresh, we only update 10% of the rows.
On a related note, I've mused about allowing a WHERE clause on REFRESH MATERIALIZED VIEW. To start with, I imagine limiting it to refer to the columns of a primary key (which implies that primary key constraints would have to be allowed). As long as this is done, I think it's pretty clear what the semantics would have to be, at least as to the new view contents: the equivalent of DELETE with the WHERE clause, followed by INSERT of the view expression with the same WHERE clause applied.
The idea of allowing a WHERE clause to be appended to REFRESH MATERIALIZED VIEW seems useful. It would enable those that know well the pattern of data modification in their underlying use case to schedule delta-updates (say, from crontab). And also it would be a useful as a foundation for more ambitious ideas. For example, some folk can/wish to do something entirely custom that is trigger-based. Others such as Nico's have toolkits that intend to cover quite a lot of the automagical refresh cases (see this other thread https://www.postgresql-archive.org/Query-Rewrite-for-Materialized-Views-FDW-Extension-td6015235.html), and it allows the data to remain in an MVIEW to boot, rather than forcing them to use normal tables instead. It could also be the foundation for something much more integrated such as Corey's CONTINUOUS MATERIALIZED VIEW concept. So I'm certainly supportive of the idea. -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Hi folks, I had a crack at this, and it was pretty simple to get something working to play around with, and it seems like it might be useful. I developed it against 10.1, as that's what I happened to be working with at the time. The patch is pretty small, and I hoped it would apply cleanly against 11. Unfortunately it doesn't, but I doubt the issues are substantial. If there is interest in moving this forward, I'll update and re-share. The patch enables pretty much exactly what Jeremy suggests — something like "refresh materialized view concurrently testview where type = 'main';" — with fairly obvious semantics. Welcome comments on the patch or approach. denty. (Seems I can't attach via the web interface, so copy/paste patch below.) -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
denty wrote > (Seems I can't attach via the web interface, so copy/paste patch below.) > > -- > Sent from: > http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html Sending attachments from this web site (that is not an official postgres website) has been disabled as requested by postgres community members (attachments where stored locally in nabble but not sent) see https://www.postgresql-archive.org/postgresql-archive-org-td6035059i20.html#a6035631 -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Hi folks, I've shared a new patch against 11.0, which seems to work as expected. (Message ID 5100C2B3-641B-4A35-86D0-12ED2E618101@QQdd.eu.) While playing with it, it is actually quite easy to get it confused. And so I wonder — is it actually what we want? For example, if I refresh including a WHERE that filters /out/ some content presently in the MV, but filters /in/ some new content relating to those same rows, then we predictably get a fail. Using the following example MV MV, 'testview', AS SELECT test.type, test.message, count(1) AS count FROM test GROUP BY test.type, test.message, then a refresh materialized view concurrently testview where type = 'main' and count>2 hits: ERROR: duplicate key value violates unique constraint "testview_type_message_idx" DETAIL: Key (type, message)=(main, hello world) already exists. CONTEXT: SQL statement "INSERT INTO public.testview SELECT (diff.newdata).* FROM pg_temp_3.pg_temp_16390_2 diff WHERE tid IS NULL" The message can probably be cleaned up. But the root cause is clearly in the fact that REFRESH ... WHERE really needs to be used quite carefully. I mused about restricting the WHERE clause Vars to allow reference only to columns that are part of the MV's UNIQUE index. It seems it would prevent the issue arising in my simple example, but is it always necessary? And would it be overly restrictive? (For example: would it prevent people issuing delta refreshes and including clauses that make the refresh performant — because perhaps it helps the planner see a short cut to speedy execution?) On a different topic, in implementing it, I noticed that there is rudimentary code-level support for incremental refreshes (see Open/CloseMatViewIncrementalMaintenance() and MatViewIncrementalMaintenanceIsEnabled()), but the facility is not hook-able. There's another discussion (Flexible permissions for REFRESH MATERIALIZED VIEW), and I wonder if a more interesting feature would be to either allow the incremental refresh barriers to be hooked by extensions, or even to offer a fine-grained permission that allows direct manipulation of data in the MV's underlying table. Far as I can see, allowing extensions to hook the incremental refresh APIs would be trivial. Exposing the same via a fine-grained permission would certainly be much harder but it might enable advanced delta-refresh strategies to emerge that are written in high level languages such as PL/pgSQL or Java (etc.) — that is certainly desirable. denty. -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html