Thread: Query Rewrite for Materialized Views (FDW Extension)

Query Rewrite for Materialized Views (FDW Extension)

From
Dent John
Date:
Hi,

I wanted to share the project I've been working on which dynamically rewrites queries to target materialized views when
viewsare available that can satisfy part of a query with lower cost plans. 

I embarked upon as an interesting side project. It took me a bit more time than I anticipated, but the result works for
myuse case. Because of that, I thought it worth sharing. However I would caution that my use case is not exactly of a
commercialscale... so please heed the following obligatory warning: 

**NOTE: this is not "production ready" code — if it works for you, then great, but use it after thorough testing, and
withappropriate caution.** 

There are some limitations to the rewrite opportunities it takes up, and it will almost certainly fail on complex
materializedviews composed of deeply nested queries. 

The extension does not have extensive (actually: any) documentation, but the few test cases should make obvious to the
inclinedreader how it works. This is deliberate at this early a stage: I don't want to encourage uninformed adoption
becauseof the possibility of data loss or incorrect query rewrites. 

The extension is written against a Postgres 10.1 source tree.

Source code: https://github.com/d-e-n-t-y/pg_fdw_mv_rewrite

Questions or comments are very welcome.

denty.


Query Rewrite for Materialized Views (Postgres Extension)

From
John Dent
Date:
Hi folks,

I thought I’d share an update to my pet project, which dynamically rewrites queries to target materialized views when they are available and can satisfy a query (or part of it) with a lower cost plan.

The extension is now a regular EXTENSION and no longer tied in to the FDW mechanism. As a result, it may now be more generally usable, and less complicated to integrate into an existing system. (The FDW approach was an easy way for me to get started, but it ultimately added complexity and was rather limiting.)

Same caution as before applies:

**NOTE: this is not "production ready" code — if it works for you, then great, but use it after thorough testing, and with appropriate caution.**

Built, and has rudimentary testing against Postgres 10.1..10.3.


Hope it is useful or interesting for someone! Questions or comments are very welcome.

denty.

Begin original message:

From: Dent John <denty@QQdd.eu>
Subject: Query Rewrite for Materialized Views (FDW Extension)
Date: 5 April 2018 at 14:41:15 BST

Hi,

I wanted to share the project I've been working on which dynamically rewrites queries to target materialized views when views are available that can satisfy part of a query with lower cost plans.

I embarked upon as an interesting side project. It took me a bit more time than I anticipated, but the result works for my use case. Because of that, I thought it worth sharing. However I would caution that my use case is not exactly of a commercial scale... so please heed the following obligatory warning:

**NOTE: this is not "production ready" code — if it works for you, then great, but use it after thorough testing, and with appropriate caution.**

There are some limitations to the rewrite opportunities it takes up, and it will almost certainly fail on complex materialized views composed of deeply nested queries.

The extension does not have extensive (actually: any) documentation, but the few test cases should make obvious to the inclined reader how it works. This is deliberate at this early a stage: I don't want to encourage uninformed adoption because of the possibility of data loss or incorrect query rewrites.

The extension is written against a Postgres 10.1 source tree.

Source code: https://github.com/d-e-n-t-y/pg_fdw_mv_rewrite

Questions or comments are very welcome.

denty.

Re: Query Rewrite for Materialized Views (Postgres Extension)

From
Pavel Stehule
Date:


2018-06-16 16:21 GMT+02:00 John Dent <denty@qqdd.eu>:
Hi folks,

I thought I’d share an update to my pet project, which dynamically rewrites queries to target materialized views when they are available and can satisfy a query (or part of it) with a lower cost plan.

The extension is now a regular EXTENSION and no longer tied in to the FDW mechanism. As a result, it may now be more generally usable, and less complicated to integrate into an existing system. (The FDW approach was an easy way for me to get started, but it ultimately added complexity and was rather limiting.)

Same caution as before applies:

**NOTE: this is not "production ready" code — if it works for you, then great, but use it after thorough testing, and with appropriate caution.**

Built, and has rudimentary testing against Postgres 10.1..10.3.


Hope it is useful or interesting for someone! Questions or comments are very welcome.

good idea.

Regards

Pavel



denty.

Begin original message:

From: Dent John <denty@QQdd.eu>
Subject: Query Rewrite for Materialized Views (FDW Extension)
Date: 5 April 2018 at 14:41:15 BST

Hi,

I wanted to share the project I've been working on which dynamically rewrites queries to target materialized views when views are available that can satisfy part of a query with lower cost plans.

I embarked upon as an interesting side project. It took me a bit more time than I anticipated, but the result works for my use case. Because of that, I thought it worth sharing. However I would caution that my use case is not exactly of a commercial scale... so please heed the following obligatory warning:

**NOTE: this is not "production ready" code — if it works for you, then great, but use it after thorough testing, and with appropriate caution.**

There are some limitations to the rewrite opportunities it takes up, and it will almost certainly fail on complex materialized views composed of deeply nested queries.

The extension does not have extensive (actually: any) documentation, but the few test cases should make obvious to the inclined reader how it works. This is deliberate at this early a stage: I don't want to encourage uninformed adoption because of the possibility of data loss or incorrect query rewrites.

The extension is written against a Postgres 10.1 source tree.

Source code: https://github.com/d-e-n-t-y/pg_fdw_mv_rewrite

Questions or comments are very welcome.

denty.


Re: Query Rewrite for Materialized Views (Postgres Extension)

From
Corey Huinker
Date:
Hope it is useful or interesting for someone! Questions or comments are very welcome.

good idea.

Regards

Pavel

In a recent PgConf NYC presentation [1] I was talking about the technical hurdles to implementing materialized views that could be kept up to date at all times, and the benefits of having such a thing.

Some use cases can be addressed with eventually-consistent derivative table structures (Vertica's projections, PipelineDB's continuous views, etc), but those methods rely on the source data never having deletes or updates, or confining those updates to the "hot" part of the source tables, so it generally works for time-series data, but not for other cases.

It has occurred to me that Dave Fetter's work on ASSERTIONS [2] has common underpinnings with true continuous materialized views. In both cases, the creation of a system object causes the creations of insert/update/delete triggers on one or more existing tables. In the case of assertions, those triggers are run with the goal of raising an error if rows are returned from a query. In the case of a materialized view, those same triggers would be used to delete rows from a CMV and insert replacements rows. 

If we can get always-up-to-date materialized views, then Denty's work on query rewrite would have greatly enhanced utility.

 

Re: Query Rewrite for Materialized Views (Postgres Extension)

From
Dent John
Date:
I commented to Corey (privately) that, while my rewrite extension has gotten me a server that responds quickly to
aggregatequeries, the constant need to refresh the supporting MVs means the system’s load average is constant and much
higherthan before. I’m happy with the tradeoff for now, but it’s a huge waste of energy, and I’m sure it must thrash my
disk.

I’m very interested in what other people think of Corey’s idea.

Re: Query Rewrite for Materialized Views (FDW Extension)

From
Jim Finnerty
Date:
Hi John,

    Can you characterize the class of queries that the MVs eligible for
query rewrite may include, and the class of statements that may be rewritten
using those mvs, in terms of joins and join types, aggregation, constraints,
types of aggregate operators, and allowed aggregate expressions?

    For example:   
        - the mv may contain any number of joins, which may be either inner
or left outer, semi-, or anti joins
        - aggregates are optional, but if present, all columns in aggregate
expressions must refer to the same table
        - MIN, MAX, SUM, COUNT, AVG aggregate operations are allowed
        - to be eligible for query rewrite a user must ... <fill in the
details>
        - local predicates are / not allowed in the MV

        - the statement being rewritten may have one or more joins.  
        - the joins need not be identical to the corresponding joins in the
MV, but <fill in the details>
        - there may be joins in the mv that are not in the statement
provided that ... <fill in the details>
        - any restrictions or assumed dependence on fdw's, since the name
'fdw extension' implies as much
        - MV rewrite will / not be cost-based
        - The statement may / not have local predicates that are not
identical to the local predicates in the mv

    If you're going to do MV rewrite using MVs that aren't transactionally
consistent with the underlying tables, then you're going to need a different
sort of isolation mode, or some sort of permissions model that enables users
to opt-in to permit the optimizer to give back "wrong results" from
potentially stale MVs.

    I didn't attend the Ottawa conference this year, so I'd be interested to
hear a summary of what the community thinks about MV incremental maintenance
as well.  There are several important subtypes of MV incremental
maintenance:

    1) incremental refresh on-commit, for individual row DML  (enables MVs
to be transactionally consistent, but incurs overhead at COMMIT time)
    2) bulk incremental refresh on commit or on demand, after a COPY
operation  (more for a DW use case, but much more efficient for bulk-insert
scenarios, and permits some optimizations that are not possible for the
first case)
    3) full refresh by partition  (not an automatic solution, and there are
some subtleties here about rows that move from one partition to another, but
it's an easy first step toward incremental mv maintenance without most of
the complexity)

there is also the possibility of doing mv refresh from other mv's that have
already been refreshed, so mv rewrite and mv refresh sometimes interact.  

thank you,

    /Jim F



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


Re: Query Rewrite for Materialized Views (Postgres Extension)

From
Nico Williams
Date:
On Mon, Jun 18, 2018 at 07:38:13PM +0100, Dent John wrote:
> I commented to Corey (privately) that, while my rewrite extension has
> gotten me a server that responds quickly to aggregate queries, the
> constant need to refresh the supporting MVs means the system’s load
> average is constant and much higher than before. I’m happy with the
> tradeoff for now, but it’s a huge waste of energy, and I’m sure it
> must thrash my disk.
>
> I’m very interested in what other people think of Corey’s idea.

I've written an alternative materialization extension (entirely as
PlPgSQL) based on PG's internals, but my version has a few big wins that
might help here.  I'm thinking of properly integrating it with PG.  Some
of the features include:

 - you can write triggers that update the materialization

   This is because the materialization is just a regular table in my
   implementation.

 - you can mark a view as needing a refresh (e.g., in a trigger)

 - you can declare a PK, other constraints, and indexes on a
   materialization

   The DMLs used to refresh a view concurrently can take advantage of
   the PK and/or other indexes to go fast.

 - you get a history table which records updates to the materialization

   This is useful for generating incremental updates to external
   systems.

Keeping track of refresh times should help decide whether to use or not
use a materialization in some query, or whether to refresh it first, or
not use it at all.

One of the things I'd eventually like to do is analyze the view query
AST to automatically generate triggers to update materializations or
mark them as needing refreshes.  A first, very very rough sketch of such
an analysis looks like this:

 - if the view query has CTEs
   -> create triggers on all its table sources to mark the
      materialization as needing a refresh

 - else if a table appears more than once as a table source in the view
   query
   -> create triggers on that table that mark the materialization as
      needing a refresh

 - else if a table appears anywhere other than the top-level
   -> create triggers .. mark as needing refresh

 - else if a table is a right-side of a left join
   -> create triggers .. mark as needing refresh

 - else if a table has no PK
   -> create triggers .. mark as needing refresh

 - else if the query has no GROUP BY, or only does a GROUP BY on this
   table and a list of columns prefixed by the table's PK
   -> rewrite the query to have WHERE eq conditions on values for the
      table's PK columns

      analyze this query

      if the result shows this table source as the first table in the
      plan
      -> create triggers on this table to update the materialization
         directly from querying the source view

 - else
   -> create triggers .. mark as needing refresh


Nico
--


Re: Query Rewrite for Materialized Views (Postgres Extension)

From
Dent John
Date:
Hi Nico,

I’m pretty impressed anything in this space can be written entirely in PlPGQSL!

If you did integrate your implementation, it would be easy for my Extension to read from a table other than the one
whichit gets the MV definition from... Although having said that, if you went down the route you suggest, would not you
makethat “regular table” into a first class scheme object very much like Corey’s CONTINUOUS MATERIALIZED VIEW object
concept?

It is interesting that you can put triggers onto the table though, as that leads well in to use cases where it is
desirableto “stack” MVs upon each other. (I’m not immediately sure whether such a use case is still needed in face of
analways-up-to-date MV feature such as is described, but I’ve seen it elsewhere.) 

You say you’d like to base it off a VIEW’s AST (rather than, I presume, you must parse the reconstructed VIEW source
textas SQL?), and I do agree — that’s probably the right direction... it does seem to me there is scope to leverage the
“bottomhalf” of the ASSERTION stuff from Dave Fetter that Corey linked to — i.e., the part of it that manages triggers.
Stillleaves the AST crawling deciding what to actually do once a change is caught. 

Really good to hear about progress in this area.

d.



Re: Query Rewrite for Materialized Views (Postgres Extension)

From
Nico Williams
Date:
On Tue, Jun 19, 2018 at 08:46:06AM +0100, Dent John wrote:
> I’m pretty impressed anything in this space can be written entirely in
> PlPGQSL!

https://github.com/twosigma/postgresql-contrib

PG is quite powerful!

I have even implemented a COMMIT TRIGGER in pure PlPgSQL.

You'll notice I make extensive use of record/table types.

> If you did integrate your implementation, it would be easy for my
> Extension to read from a table other than the one which it gets the MV
> definition from... Although having said that, if you went down the
> route you suggest, would not you make that “regular table” into a
> first class scheme object very much like Corey’s CONTINUOUS
> MATERIALIZED VIEW object concept?

I know nothing about the CONTINUOUS MATERIALIZED VIEW concept.  What
that would imply to me seems... difficult to achieve.  There will be
view queries that are difficult or impossible to automatically write
triggers for that update an MV synchronously.

> It is interesting that you can put triggers onto the table though, as
> that leads well in to use cases where it is desirable to “stack” MVs
> upon each other. (I’m not immediately sure whether such a use case is
> still needed in face of an always-up-to-date MV feature such as is
> described, but I’ve seen it elsewhere.)

I have done exactly this sort of MV chaining.

In my use case I had an MV of a nesting group transitive closure and
then another of a join between that and user group memberships to get a
complete user group transitive closure.

The transitive closure of nesting groups being computed via a RECURSIVE
CTE...  In principle one can understand such a query and automatically
write DMLs to update the MV on the fly (I've done this _manually_), but
the moment you do any bulk updates out of sequence you can't, and then
you have to refresh the view, so you see, I don't quite believe we can
have a true continuously materialized view :(

For me the key requirement is the ability to generate incremental
updates to an external system, but also the whole thing has to be fast.

> You say you’d like to base it off a VIEW’s AST (rather than, I
> presume, you must parse the reconstructed VIEW source text as SQL?),

PG already stores the AST.  There's no need to write a new parser when
PG already has one.  At the end of the day you need to analyze an AST
for the MV's source query in order to automatically write the triggers
to keep it updated (or mark it as needing a refresh).

> and I do agree — that’s probably the right direction... it does seem
> to me there is scope to leverage the “bottom half” of the ASSERTION
> stuff from Dave Fetter that Corey linked to — i.e., the part of it
> that manages triggers. Still leaves the AST crawling deciding what to
> actually do once a change is caught.

I'll search for this.

> Really good to hear about progress in this area.

Eh, I've not actually implemented any automatic generation of triggers
to update MVs.  I've written enough such triggers manually to believe
that *some* of them could be written by software.  If you look at my
sketch for how to do it, you'll notice that many of the sorts of queries
that one would choose to materialize... are not really amenable to this
treatment -- that's precisely because those make for the sorts of slow
queries that make you reach for materialization in the first place :(

But even so, automatically-generated triggers that mark an MV as needing
a refresh are always possible, and that is a huge improvement anyways,
especially if concurrent view refreshes can be made to go faster (by
having PKs on the MVs).  The idea is to have some sort of adaptive
automatic background, concurrent MV refresh running on a frequency based
in part of the amount of time it takes to refresh the VIEW.

BTW, MERGE would be a significant optimization for concurrent MV
refresh.  Think of MERGE as a statement that can scan a source, FULL
OUTER JOIN it to a target table, and for each row do an INSERT, UPDATE,
or DELETE -- this is 3x faster than the three INSERT/UPDATE/DELETE
statements you need to do the same work without a MERGE!

Nico
--


Re: Query Rewrite for Materialized Views (Postgres Extension)

From
Dent John
Date:
Hi Nico,

By the way, I do agree with your point about MERGE — if we can factor MV updates in that fashion, it will certainly save.

I didn’t reply immediately because your point caught me off guard:

[…]  If you look at my
sketch for how to do it, you'll notice that many of the sorts of queries
that one would choose to materialize... are not really amenable to this
treatment […]

I’d rather presumed that there would be many examples of DML on base relations that could trigger a direct (incremental) update to the MV. I had presumed it, but not actually done any research.

So I did a bit of a trawl. There’s actually quite a lot of academic research out there, including [1] and [2]. [2] references a bunch of methods for incremental MV refresh, and ties them into a graph query context. I’m not sure if the graph query context itself is relevant for Postgres, but it’s certainly interesting and perhaps suggests that incremental refresh of at least some RECURSIVE MVs may not be entirely impossible. I also found [3], which is /very/ dated, but it strongly supports that MVs are a performant path to executing certain types of query.

So I definitely agree with you in the general case, but it seems there is scope to provide an incremental MV refresh capability that is broadly useful.

Almost certainly, any initial implementation would quickly fall back to a “full refresh”. But the refresh planner’s capability develops, I wonder if it could not embody an intelligent strategy that might even recognise common recursive patterns such as the transitive closure you mention, and refresh on an incremental basis — that would be really quite a cool capability to have.

denty.