Updating MATERIALIZED VIEWs (Re: [HACKERS] delta relations in AFTERtriggers) - Mailing list pgsql-hackers

From Nico Williams
Subject Updating MATERIALIZED VIEWs (Re: [HACKERS] delta relations in AFTERtriggers)
Date
Msg-id 20170120233822.GC1838@localhost
Whole thread Raw
In response to Re: [HACKERS] delta relations in AFTER triggers  (Kevin Grittner <kgrittn@gmail.com>)
List pgsql-hackers
[Looking at your patch I see that it's not quite related to MVs, so I'm
changing the Subject.  Apologies for the noise.]

[Responding out of order.]

On Fri, Jan 20, 2017 at 03:37:20PM -0600, Kevin Grittner wrote:
> On Fri, Jan 20, 2017 at 2:08 PM, Nico Williams <nico@cryptonector.com> wrote:
> > BTW, automatic updates of certain types of MVs should be easy: add
> > constraints based on NEW/OLD rows from synthetic triggers to the
> > underlying query.
> 
> Convincing me that this is a good idea for actual MVs, versus
> pseudo-MVs using tables, would be an uphill battle.  [...]

I don't think it's necessary, and I would not insist on it.

My alternative MV implementation lets _me_ choose when to update an MV
synchronously, and when to defer refreshes, by using [hand-coded]
triggers.  This is good enough for me.

If these triggers could be automatically generated, that sure would be
nice, but some control would be needed over when to update the MV vs.
mark it as needing a refresh.

> > Our intention is to contribute this.  We're willing to sign
> > reasonable contribution agreements.
> 
> Posting a patch to these lists constitutes an assertion that you
> have authority to share the IP, and are doing so.  Referencing a
> URL is a bit iffy, since it doesn't leave an archival copy of the
> contribution under the community's control.

Fair enough.  I'll post the source file itself.  I've not done the work
of properly integrating it because I need to gauge interest first,
before dedicating a lot of effort to it.

> I am dubious, though, of the approach in general, as stated above.

I'm using this _now_.  With a caveat:

a) the trigger functions needed to either mark an MV as needing a
refresh, or else to update it directly, are hand-coded, and

b) I chose which operations yield synchronous MV updates and which defer
to a refresh.

The MV, in my scheme, is really just a table with triggers that update a
deltas table the same way that a refresh would.  A refresh locks the
table, disables those triggers, populates another table with the current
output of the underlying view, compares to the previous materialization,
and lastly generates, records, and applies deltas to the
materialization.

To give an example, adding a user to a group -> generally fast; deleting
a user (and thus all their group memberships) -> potentially very slow.

The "add a user to a group" case can then yield near real-time updates
of external caches, while the other case results in a deferred REFRESH
so as to not slow down the current transaction.  The deferred REFRESH is
not deferred too long, so the net effect is still very fast updates of
external caches.

> > However, there is a bug in the query planner that prevents this
> > from being very fast.  At some point I want to tackle that bug.
> 
> What bug is that?

I... asked for help on the IRC #postgresql channel.  I never posted here
about it.

Revisiting it now... the main problem was query _preparation time_, not
execution time.  So perhaps not so bad.  Still, it's worth looking into.

The query was something like this:
       SELECT v.data->'verb_name' || '^' || (r.data->'named_acl_name') AS grant_name,
grantee.right_entity_idAS grantee_id       FROM relationships grantee       JOIN relationships grant ON
grantee.left_entity_id= grant.right_entity_id AND            grantee.relationship_type_id IN (10421, 10431, 13591,
13921)           AND grant.relationship_type_id = 10331       JOIN relationships perm_actions ON
grantee.left_entity_id= perm_actions.right_entity_id AND            perm_actions.relationship_type_id = 10381
JOINrelationships verb_in_vs ON            verb_in_vs.right_entity_id = perm_actions.left_entity_id AND
verb_in_vs.relationship_type_id= 10371       JOIN entities v ON v.id = verb_in_vs.left_entity_id       JOIN entities r
ONr.id = grant.left_entity_id;
 
 (This query uses a bit of an EAV schema.  There's an "entities" table with an hstore column for storing attributes
("data")and another table, "relationships" that has (relationship_type_id, left_entity_id, right_entity_id) columns and
whichis indexed by both, left_entity_id and right_entity_id.  EAV schemas hide relevant information from the query
planner,so there is that.)
 

The query plan for this is about as fast as one could hope.  After all,
it has to scan many of the rows.

Now suppose we were adding a new 'grantee' and wanted to generate the
additions that would result in the MV.  We could add this constraint to
the query:
       WHERE grantee.left_entity_id = NEW.left_entity_id AND           grantee.right_entity_id = NEW.right_entity_id;

Now we've basically [almost] fully-specified the primary key for the
grantee table source.

The resulting query plan is actually pretty good.  It has the grantee
table source as the first table source in the inner-most loop.

If I re-write the query using WITH CTEs to get a similarly good plan,
then query preparation runs about 250x faster.

Since I want to update (for some kinds of changes) an MV based on this
query, and I want to do it from some triggers.  I need this to be fast.

I'm not sure when trigger functions' queries get re-prepared, so maybe
I'm concerned over nothing much?

> > I'd appreciate a review, for sure.  Thanks!
> 
> Would it be possible to get your approach running using tables
> and/or (non-materialized) views as an extension?  A trigger-based
> way to maintain pseudo-MVs via triggers might make an interesting
> extension, possibly even included in contrib if it could be shown
> to have advantages over built-in MVs for some non-trivial
> applications.

I can certainly send a patch to include this in /contrib.

> > There's a gotcha w.r.t. NULL columns, but it affects the built-in
> > REFRESH as well, IIRC.  The commentary in our implementation
> > discusses that in more detail.
> 
> Could you report that on a new thread on the lists?  I've seen
> comments about such a "gotcha", but am not clear on the details.
> It probably deserves its own thread.  Once understood, we can
> probably fix it.

OK, will do.

Thanks,

Nico
-- 



pgsql-hackers by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: [HACKERS] [PATCH] Add GUCs for predicate lock promotion thresholds
Next
From: Michael Paquier
Date:
Subject: Re: [HACKERS] increasing the default WAL segment size