Re: Query Rewrite for Materialized Views (Postgres Extension) - Mailing list pgsql-hackers

From Nico Williams
Subject Re: Query Rewrite for Materialized Views (Postgres Extension)
Date
Msg-id 20180619154723.GD4200@localhost
Whole thread Raw
In response to Re: Query Rewrite for Materialized Views (Postgres Extension)  (Dent John <denty@qqdd.eu>)
Responses Re: Query Rewrite for Materialized Views (Postgres Extension)
List pgsql-hackers
On Tue, Jun 19, 2018 at 08:46:06AM +0100, Dent John wrote:
> I’m pretty impressed anything in this space can be written entirely in
> PlPGQSL!

https://github.com/twosigma/postgresql-contrib

PG is quite powerful!

I have even implemented a COMMIT TRIGGER in pure PlPgSQL.

You'll notice I make extensive use of record/table types.

> If you did integrate your implementation, it would be easy for my
> Extension to read from a table other than the one which it gets the MV
> definition from... Although having said that, if you went down the
> route you suggest, would not you make that “regular table” into a
> first class scheme object very much like Corey’s CONTINUOUS
> MATERIALIZED VIEW object concept?

I know nothing about the CONTINUOUS MATERIALIZED VIEW concept.  What
that would imply to me seems... difficult to achieve.  There will be
view queries that are difficult or impossible to automatically write
triggers for that update an MV synchronously.

> It is interesting that you can put triggers onto the table though, as
> that leads well in to use cases where it is desirable to “stack” MVs
> upon each other. (I’m not immediately sure whether such a use case is
> still needed in face of an always-up-to-date MV feature such as is
> described, but I’ve seen it elsewhere.)

I have done exactly this sort of MV chaining.

In my use case I had an MV of a nesting group transitive closure and
then another of a join between that and user group memberships to get a
complete user group transitive closure.

The transitive closure of nesting groups being computed via a RECURSIVE
CTE...  In principle one can understand such a query and automatically
write DMLs to update the MV on the fly (I've done this _manually_), but
the moment you do any bulk updates out of sequence you can't, and then
you have to refresh the view, so you see, I don't quite believe we can
have a true continuously materialized view :(

For me the key requirement is the ability to generate incremental
updates to an external system, but also the whole thing has to be fast.

> You say you’d like to base it off a VIEW’s AST (rather than, I
> presume, you must parse the reconstructed VIEW source text as SQL?),

PG already stores the AST.  There's no need to write a new parser when
PG already has one.  At the end of the day you need to analyze an AST
for the MV's source query in order to automatically write the triggers
to keep it updated (or mark it as needing a refresh).

> and I do agree — that’s probably the right direction... it does seem
> to me there is scope to leverage the “bottom half” of the ASSERTION
> stuff from Dave Fetter that Corey linked to — i.e., the part of it
> that manages triggers. Still leaves the AST crawling deciding what to
> actually do once a change is caught.

I'll search for this.

> Really good to hear about progress in this area.

Eh, I've not actually implemented any automatic generation of triggers
to update MVs.  I've written enough such triggers manually to believe
that *some* of them could be written by software.  If you look at my
sketch for how to do it, you'll notice that many of the sorts of queries
that one would choose to materialize... are not really amenable to this
treatment -- that's precisely because those make for the sorts of slow
queries that make you reach for materialization in the first place :(

But even so, automatically-generated triggers that mark an MV as needing
a refresh are always possible, and that is a huge improvement anyways,
especially if concurrent view refreshes can be made to go faster (by
having PKs on the MVs).  The idea is to have some sort of adaptive
automatic background, concurrent MV refresh running on a frequency based
in part of the amount of time it takes to refresh the VIEW.

BTW, MERGE would be a significant optimization for concurrent MV
refresh.  Think of MERGE as a statement that can scan a source, FULL
OUTER JOIN it to a target table, and for each row do an INSERT, UPDATE,
or DELETE -- this is 3x faster than the three INSERT/UPDATE/DELETE
statements you need to do the same work without a MERGE!

Nico
--


pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: Postgres 11 release notes
Next
From: Andres Freund
Date:
Subject: Re: WAL prefetch