Re: Feature: triggers on materialized views - Mailing list pgsql-hackers

From Mitar
Subject Re: Feature: triggers on materialized views
Date
Msg-id CAKLmikNX84jxB3=4BMUtHP91XBSTiW8GY6mT-iP3P-m_LpYFWQ@mail.gmail.com
Whole thread Raw
In response to Feature: triggers on materialized views  (Mitar <mmitar@gmail.com>)
Responses Re: Feature: triggers on materialized views
Re: Feature: triggers on materialized views
List pgsql-hackers
Hi!

I have made an updated version of the patch, added tests and
documentation changes. This is my view now a complete patch. Please
provide any feedback or comments you might have for me to improve the
patch. I will also add it to commitfest.

A summary of the patch: This patch enables adding AFTER triggers (both
ROW and STATEMENT) on materialized views. They are fired when doing
REFRESH MATERIALIZED VIEW CONCURRENTLY for rows which have changed.
Triggers are not fired if you call REFRESH without CONCURRENTLY. This
is based on some discussion on the mailing list because implementing
it for without CONCURRENTLY would require us to add logic for firing
triggers where there was none before (and is just an efficient heap
swap).

To be able to create a materialized view without data, specify
triggers, and REFRESH CONCURRENTLY so that those triggers would be
called also for initial data, I have tested and determined that there
is no reason why REFRESH CONCURRENTLY could not be run on
uninitialized materialized view. So I removed that check and things
seem to just work. Including triggers being called for initial data. I
think this makes REFRESH CONCURRENTLY have one less special case which
is in general nicer.

I have run tests and all old tests still succeed. I have added more
tests for the new feature.

I have run benchmark to evaluate the impact of me changing
refresh_by_match_merge to do UPDATE instead of DELETE and INSERT for
rows which were just updated. In fact it seems this improves
performance slightly (4% in my benchmark, mean over 10 runs). I guess
that this is because it is cheaper to just change one column's values
(what benchmark is doing when changing rows) instead of removing and
inserting the whole row. Because REFRESH MATERIALIZED VIEW
CONCURRENTLY is meant to be used when not a lot of data has been
changed anyway, I find this a pleasantly surprising additional
improvement in this patch. I am attaching the benchmark script I have
used. I compared the time of the final refresh query in the script. (I
would love if pgbench could take a custom init script to run before
the timed part of the script.)


Mitar

On Mon, Dec 24, 2018 at 12:59 PM Mitar <mmitar@gmail.com> wrote:
>
> Hi!
>
> Based on discussion about observing changes on an open query in a
> reactive manner (to support reactive web applications) [1], I
> identified that one critical feature is missing to fully implement
> discussed design of having reactive queries be represented as
> materialized views, and changes to these materialized views would then
> be observed and pushed to the client through LISTEN/NOTIFY.
>
> This is my first time contributing to PostgreSQL, so I hope I am
> starting this process well.
>
> I would like to propose that support for AFTER triggers are added to
> materialized views. I experimented a bit and it seems this is mostly
> just a question of enabling/exposing them. See attached patch. This
> enabled me to add trigger to a material view which mostly worked. Here
> are my findings.
>
> Running REFRESH MATERIALIZED VIEW CONCURRENTLY calls triggers. Both
> per statement and per row. There are few improvements which could be
> done:
>
> - Currently only insert and remove operations are done on the
> materialized view. This is because the current logic just removes
> changed rows and inserts new rows.
> - In current concurrently refresh logic those insert and remove
> operations are made even if there are no changes to be done. Which
> triggers a statement trigger unnecessary. A small improvement could be
> to skip the statement in that case, but looking at the code this seems
> maybe tricky because both each of inserts and deletions are done
> inside one query each.
> - Current concurrently refresh logic does never do updates on existing
> rows. It would be nicer to have that so that triggers are more aligned
> with real changes to the data. So current two queries could be changed
> to three, each doing one of the insert, update, and delete.
>
> Non-concurrent refresh does not trigger any trigger. But it seems all
> data to do so is there (previous table, new table), at least for the
> statement-level trigger. Row-level triggers could also be simulated
> probably (with TRUNCATE and INSERT triggers).
>
> [1]
https://www.postgresql.org/message-id/flat/CAKLmikP%2BPPB49z8rEEvRjFOD0D2DV72KdqYN7s9fjh9sM_32ZA%40mail.gmail.com
>
>
> Mitar
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m

Attachment

pgsql-hackers by date:

Previous
From: "Tsunakawa, Takayuki"
Date:
Subject: RE: Speeding up creating UPDATE/DELETE generic plan for partitionedtable into a lot
Next
From: Mitar
Date:
Subject: Re: Feature: temporary materialized views