Thread: [HACKERS] Contrib: alternative MATERIALIZED VIEWs

[HACKERS] Contrib: alternative MATERIALIZED VIEWs

From

Nico Williams

Date:

24 January 2017, 02:38:20

Attached is an alternative implementation of MATERIALIZED VIEWs.

The idea is to explore possible enahncements to the PostgreSQL
MATERIALIZED VIEW features.

Features:

 - All SQL-coded.

 - Keeps history of deltas computed at each refresh.

 - Allows DMLs of the materialized view, recording the changes in the
   same way as deltas from refreshes.

   This allows one to code TRIGGERs which update materialized views
   directly.

   Where synchronous updates of an MV can be fast, a TRIGGER can do it
   by querying the source view with additional constraints derived from
   the OLD/NEW rows, and then apply DMLs to the materialization table.

 - MVs can be marked as needing a refresh.

   This is useful where a synchronous update would be too slow.  Use a
   TRIGGER to mark the MV as needing a refresh and NOTIFY a waiting
   service.

 - Refreshes have the same concurrency semantics as REFRESH MATERIALIZED
   VIEW CONCURRENTLY.

 - Allows indices and constraints to be added to the materialzation
   table.

Issues:

 - NULLs in columns of the VIEW cause spurious deltas to be recorded for
   every refresh.  This is because two rows which are equal when
   considering NULLs to be equal are... not equal as far as SQL goes,
   thus such rows always appear to be deleted and inserted.

   This implementation uses a NATURAL FULL OUTER JOIN to compute the
   deltas between a before and an after materialization of a view during
   a refresh.  This avoids having to generate a USING(), which is a very
   convenient simplification, but also part of the source of this
   problem with NULLs.  (The history table has two record-type columns
   to hold entire rows.)

 - No integration.

 - Wonky column names in the history table ("awld", "noo").

Ideas:

 - CREATE MATERIALIZED VIEW should have these additional options, and
   ALTER MATERIALIZED VIEW should allow these to be specified after
   creation:

    - WITH [UNLOGGED] HISTORY TABLE schema_name.table_name

    - WITH PRIMARY KEY (column_list) -- probably not in ALTER MV though

      A PK on an MV does make sense when one considers the admonition in
      the PG docs to not have duplicate rows in the view...  Besides, an
      MV has a materialization table, and tables generally should have
      PKs!

    - WITH CONSTRAINT <constraint> (same as in ALTER TABLE ADD
      constraint)


Also, a new type of JOIN might be useful: one that joins using only
columns that are part of the PK of both table sources.  Obviously this
would not be a generally-applicable JOIN type, as it would not work for
table sources that are subqueries or plain views...  But it would be
useful here for generating the FULL OUTER JOIN needed for computing
deltas between tables of the same form.

Nico
--

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pseudo_mat_views.sql

Re: [HACKERS] Contrib: alternative MATERIALIZED VIEWs

From

Jim Nasby

Date:

24 January 2017, 03:05:25

On 1/23/17 5:38 PM, Nico Williams wrote:
> Attached is an alternative implementation of MATERIALIZED VIEWs.

Interesting. I don't see this being accepted into core because it's 
plpgsql and it depends on the user to track what criteria to use to 
apply the update. The second item is the biggest issue.

That said, I think this would be useful to some people as an extension. 
I suggest you put it on github (or equivalent) and upload it to 
http://pgxn.org.

In terms of community support, the next step is to get statement-level 
support for NEW and OLD, something I think Kevin has been working on.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

Re: [HACKERS] Contrib: alternative MATERIALIZED VIEWs

From

Nico Williams

Date:

24 January 2017, 07:09:39

On Mon, Jan 23, 2017 at 06:05:25PM -0600, Jim Nasby wrote:
> On 1/23/17 5:38 PM, Nico Williams wrote:
> >Attached is an alternative implementation of MATERIALIZED VIEWs.
> 
> Interesting. I don't see this being accepted into core because it's plpgsql
> and it depends on the user to track what criteria to use to apply the
> update. The second item is the biggest issue.

I myself said this is not properly integrated.  I do use this in an
actual system, but ideally if any of this is to be welcomed, then it'd
have to be properly integrated.  I don't see what's wrong with the use
of plpgsql as the MV system in PG uses SQL, but in a proper integration
a lot of this would be re-written in C (the use of SQL for the delta
computation and the updates of the history table would remain, but I
think too so would the triggers needed to update the history table when
the MV is updated directly).

As to the second issue...  Just the other day Kevin Grittner was
concerned about automatic MV updates because some of them can take too
long.  Now, PG could timeout MV updates, roll them back, and mark an MV
as requiring a refresh, but the user might still have something to say
about this: they really, really might want some updates to always happen
synchronously, and others always asynchronously.  I think it's really
necessary to give the user this sort of control.

> That said, I think this would be useful to some people as an extension. I
> suggest you put it on github (or equivalent) and upload it to
> http://pgxn.org.

Ah, I forgot to mention that it is on github here:

https://github.com/twosigma/postgresql-contrib

> In terms of community support, the next step is to get statement-level
> support for NEW and OLD, something I think Kevin has been working on.

Well, I think there's a lot to look at here.  That's partly why a
plpgsql-coded implementation is a good first step, IMO: it helps find
issues.  For me the need for a PK for MVs is fairly important; but the need
for deltas/history table is critical, and there is no reason not to have
it considering that deltas are produced internally during REFRESH ..
CONCURRENTLY.

Nico
--