Re: Query Rewrite for Materialized Views (Postgres Extension) - Mailing list pgsql-hackers

From Nico Williams
Subject Re: Query Rewrite for Materialized Views (Postgres Extension)
Date
Msg-id 20180618191355.GC4200@localhost
Whole thread Raw
In response to Re: Query Rewrite for Materialized Views (Postgres Extension)  (Dent John <denty@QQdd.eu>)
Responses Re: Query Rewrite for Materialized Views (Postgres Extension)
List pgsql-hackers
On Mon, Jun 18, 2018 at 07:38:13PM +0100, Dent John wrote:
> I commented to Corey (privately) that, while my rewrite extension has
> gotten me a server that responds quickly to aggregate queries, the
> constant need to refresh the supporting MVs means the system’s load
> average is constant and much higher than before. I’m happy with the
> tradeoff for now, but it’s a huge waste of energy, and I’m sure it
> must thrash my disk.
>
> I’m very interested in what other people think of Corey’s idea.

I've written an alternative materialization extension (entirely as
PlPgSQL) based on PG's internals, but my version has a few big wins that
might help here.  I'm thinking of properly integrating it with PG.  Some
of the features include:

 - you can write triggers that update the materialization

   This is because the materialization is just a regular table in my
   implementation.

 - you can mark a view as needing a refresh (e.g., in a trigger)

 - you can declare a PK, other constraints, and indexes on a
   materialization

   The DMLs used to refresh a view concurrently can take advantage of
   the PK and/or other indexes to go fast.

 - you get a history table which records updates to the materialization

   This is useful for generating incremental updates to external
   systems.

Keeping track of refresh times should help decide whether to use or not
use a materialization in some query, or whether to refresh it first, or
not use it at all.

One of the things I'd eventually like to do is analyze the view query
AST to automatically generate triggers to update materializations or
mark them as needing refreshes.  A first, very very rough sketch of such
an analysis looks like this:

 - if the view query has CTEs
   -> create triggers on all its table sources to mark the
      materialization as needing a refresh

 - else if a table appears more than once as a table source in the view
   query
   -> create triggers on that table that mark the materialization as
      needing a refresh

 - else if a table appears anywhere other than the top-level
   -> create triggers .. mark as needing refresh

 - else if a table is a right-side of a left join
   -> create triggers .. mark as needing refresh

 - else if a table has no PK
   -> create triggers .. mark as needing refresh

 - else if the query has no GROUP BY, or only does a GROUP BY on this
   table and a list of columns prefixed by the table's PK
   -> rewrite the query to have WHERE eq conditions on values for the
      table's PK columns

      analyze this query

      if the result shows this table source as the first table in the
      plan
      -> create triggers on this table to update the materialization
         directly from querying the source view

 - else
   -> create triggers .. mark as needing refresh


Nico
--


pgsql-hackers by date:

Previous
From: Jim Finnerty
Date:
Subject: Re: Query Rewrite for Materialized Views (FDW Extension)
Next
From: AJG
Date:
Subject: Re: Column store in Greenplum