Thread: Delta Materialized View Refreshes?

Delta Materialized View Refreshes?

From
Jeremy Finzel
Date:
To me the biggest disadvantage of using a materialized view instead of a table delta load for an ETL process, for example, is that the materialized view is very inefficient in terms of write overhead, for of course it rewrites the whole set of data in the table, meaning heap, indexes, WAL and everything that goes with it.

Couldn't this be coded to only update changed records?  As an example, we have a materialized view definition that holds summarized data on customer records, and it only changes when new customers are added.  It would be so nice to only write that change.

I recognize that it may take longer, but for our environment as I'm sure is true of many others, we are often much more excited about reducing write churn than read churn.

If in fact it could be option, I am curious if we could leverage the work being done on the merge functionality to do a "merge" from the materialized view definition into the actual persistent object.

If people really like the current functionality, then at least it could be an option like REFRESH MATERIALIZED VIEW DELTA or something - I dunno I am more interested in just having the feature!

Many thanks for any thoughts on this subject.
Jeremy

Re: Delta Materialized View Refreshes?

From
Jordan Deitch
Date:
Hi Jeremy,

Could you apply something similar using triggers?
One question would be how PG would identify changes to existing rows - using the replication facilities to essentially
replicateinto the view? This would be quite tricky I reckon. Otherwise a change to the underlying table may not
propagatecorrectly to the MV. 
 


-- 
Jordan Deitch
https://id.rsa.pub/


Re: Delta Materialized View Refreshes?

From
Jeremy Finzel
Date:

Could you apply something similar using triggers?
One question would be how PG would identify changes to existing rows - using the replication facilities to essentially replicate into the view? This would be quite tricky I reckon. Otherwise a change to the underlying table may not propagate correctly to the MV.

That's not what I had in mind.  I only mean when REFRESH MATERIALIZED VIEW is run, it gathers the results of the view in memory, then instead of essentially "wiping and reloading" the table, it would only write the differences.  So if 90% of the rows would be the same as before the refresh, we only update 10% of the rows.

This would also mean materialized views could get bloated just like tables.

Thanks,
Jeremy

Re: Delta Materialized View Refreshes?

From
Isaac Morland
Date:
On Fri, 14 Sep 2018 at 16:26, Jeremy Finzel <finzelj@gmail.com> wrote:

Could you apply something similar using triggers?
One question would be how PG would identify changes to existing rows - using the replication facilities to essentially replicate into the view? This would be quite tricky I reckon. Otherwise a change to the underlying table may not propagate correctly to the MV.

That's not what I had in mind.  I only mean when REFRESH MATERIALIZED VIEW is run, it gathers the results of the view in memory, then instead of essentially "wiping and reloading" the table, it would only write the differences.  So if 90% of the rows would be the same as before the refresh, we only update 10% of the rows.

On a related note, I've mused about allowing a WHERE clause on REFRESH MATERIALIZED VIEW. To start with, I imagine limiting it to refer to the columns of a primary key (which implies that primary key constraints would have to be allowed). As long as this is done, I think it's pretty clear what the semantics would have to be, at least as to the new view contents: the equivalent of DELETE with the WHERE clause, followed by INSERT of the view expression with the same WHERE clause applied.

Re: Delta Materialized View Refreshes?

From
denty
Date:
The idea of allowing a WHERE clause to be appended to REFRESH MATERIALIZED
VIEW seems useful.

It would enable those that know well the pattern of data modification in
their underlying use case to schedule delta-updates (say, from crontab).

And also it would be a useful as a foundation for more ambitious ideas. For
example, some folk can/wish to do something entirely custom that is
trigger-based. Others such as Nico's have toolkits that intend to cover
quite a lot of the automagical refresh cases (see this other thread
https://www.postgresql-archive.org/Query-Rewrite-for-Materialized-Views-FDW-Extension-td6015235.html),
and it allows the data to remain in an MVIEW to boot, rather than forcing
them to use normal tables instead. It could also be the foundation for
something much more integrated such as Corey's CONTINUOUS MATERIALIZED VIEW
concept.

So I'm certainly supportive of the idea.



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


Re: Delta Materialized View Refreshes?

From
denty
Date:
Hi folks,

I had a crack at this, and it was pretty simple to get something working to
play around with, and it seems like it might be useful.

I developed it against 10.1, as that's what I happened to be working with at
the time. The patch is pretty small, and I hoped it would apply cleanly
against 11. Unfortunately it doesn't, but I doubt the issues are
substantial. If there is interest in moving this forward, I'll update and
re-share.

The patch enables pretty much exactly what Jeremy suggests — something like
"refresh materialized view concurrently testview where type = 'main';" —
with fairly obvious semantics.

Welcome comments on the patch or approach.

denty.

(Seems I can't attach via the web interface, so copy/paste patch below.)





--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


Re: Delta Materialized View Refreshes?

From
legrand legrand
Date:
denty wrote
> (Seems I can't attach via the web interface, so copy/paste patch below.)
> 
> --
> Sent from:
> http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Sending attachments from this web site (that is not an official postgres
website)
has been disabled as requested by postgres community members
(attachments where stored locally in nabble but not sent)

see
https://www.postgresql-archive.org/postgresql-archive-org-td6035059i20.html#a6035631





--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


Re: Delta Materialized View Refreshes?

From
denty
Date:
Hi folks,

I've shared a new patch against 11.0, which seems to work as expected.
(Message ID 5100C2B3-641B-4A35-86D0-12ED2E618101@QQdd.eu.)

While playing with it, it is actually quite easy to get it confused. And so
I wonder — is it actually what we want?

For example, if I refresh including a WHERE that filters /out/ some content
presently in the MV, but filters /in/ some new content relating to those
same rows, then we predictably get a fail. Using the following example MV
MV, 'testview', AS SELECT test.type, test.message, count(1) AS count FROM
test GROUP BY test.type, test.message, then a refresh materialized view
concurrently testview where type = 'main' and count>2 hits:

ERROR:  duplicate key value violates unique constraint
"testview_type_message_idx"
DETAIL:  Key (type, message)=(main, hello world) already exists.
CONTEXT:  SQL statement "INSERT INTO public.testview SELECT (diff.newdata).*
FROM pg_temp_3.pg_temp_16390_2 diff WHERE tid IS NULL"

The message can probably be cleaned up. But the root cause is clearly in the
fact that REFRESH ... WHERE really needs to be used quite carefully.

I mused about restricting the WHERE clause Vars to allow reference only to
columns that are part of the MV's UNIQUE index. It seems it would prevent
the issue arising in my simple example, but is it always necessary? And
would it be overly restrictive? (For example: would it prevent people
issuing delta refreshes and including clauses that make the refresh
performant — because perhaps it helps the planner see a short cut to speedy
execution?)

On a different topic, in implementing it, I noticed that there is
rudimentary code-level support for incremental refreshes (see
Open/CloseMatViewIncrementalMaintenance() and
MatViewIncrementalMaintenanceIsEnabled()), but the facility is not
hook-able. There's another discussion (Flexible permissions for REFRESH
MATERIALIZED VIEW), and I wonder if a more interesting feature would be to
either allow the incremental refresh barriers to be hooked by extensions, or
even to offer a fine-grained permission that allows direct manipulation of
data in the MV's underlying table.

Far as I can see, allowing extensions to hook the incremental refresh APIs
would be trivial. Exposing the same via a fine-grained permission would
certainly be much harder but it might enable advanced delta-refresh
strategies to emerge that are written in high level languages such as
PL/pgSQL or Java (etc.) — that is certainly desirable.

denty.



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html