Thread: [HACKERS] Contrib: alternative MATERIALIZED VIEWs
Attached is an alternative implementation of MATERIALIZED VIEWs. The idea is to explore possible enahncements to the PostgreSQL MATERIALIZED VIEW features. Features: - All SQL-coded. - Keeps history of deltas computed at each refresh. - Allows DMLs of the materialized view, recording the changes in the same way as deltas from refreshes. This allows one to code TRIGGERs which update materialized views directly. Where synchronous updates of an MV can be fast, a TRIGGER can do it by querying the source view with additional constraints derived from the OLD/NEW rows, and then apply DMLs to the materialization table. - MVs can be marked as needing a refresh. This is useful where a synchronous update would be too slow. Use a TRIGGER to mark the MV as needing a refresh and NOTIFY a waiting service. - Refreshes have the same concurrency semantics as REFRESH MATERIALIZED VIEW CONCURRENTLY. - Allows indices and constraints to be added to the materialzation table. Issues: - NULLs in columns of the VIEW cause spurious deltas to be recorded for every refresh. This is because two rows which are equal when considering NULLs to be equal are... not equal as far as SQL goes, thus such rows always appear to be deleted and inserted. This implementation uses a NATURAL FULL OUTER JOIN to compute the deltas between a before and an after materialization of a view during a refresh. This avoids having to generate a USING(), which is a very convenient simplification, but also part of the source of this problem with NULLs. (The history table has two record-type columns to hold entire rows.) - No integration. - Wonky column names in the history table ("awld", "noo"). Ideas: - CREATE MATERIALIZED VIEW should have these additional options, and ALTER MATERIALIZED VIEW should allow these to be specified after creation: - WITH [UNLOGGED] HISTORY TABLE schema_name.table_name - WITH PRIMARY KEY (column_list) -- probably not in ALTER MV though A PK on an MV does make sense when one considers the admonition in the PG docs to not have duplicate rows in the view... Besides, an MV has a materialization table, and tables generally should have PKs! - WITH CONSTRAINT <constraint> (same as in ALTER TABLE ADD constraint) Also, a new type of JOIN might be useful: one that joins using only columns that are part of the PK of both table sources. Obviously this would not be a generally-applicable JOIN type, as it would not work for table sources that are subqueries or plain views... But it would be useful here for generating the FULL OUTER JOIN needed for computing deltas between tables of the same form. Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 1/23/17 5:38 PM, Nico Williams wrote: > Attached is an alternative implementation of MATERIALIZED VIEWs. Interesting. I don't see this being accepted into core because it's plpgsql and it depends on the user to track what criteria to use to apply the update. The second item is the biggest issue. That said, I think this would be useful to some people as an extension. I suggest you put it on github (or equivalent) and upload it to http://pgxn.org. In terms of community support, the next step is to get statement-level support for NEW and OLD, something I think Kevin has been working on. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
On Mon, Jan 23, 2017 at 06:05:25PM -0600, Jim Nasby wrote: > On 1/23/17 5:38 PM, Nico Williams wrote: > >Attached is an alternative implementation of MATERIALIZED VIEWs. > > Interesting. I don't see this being accepted into core because it's plpgsql > and it depends on the user to track what criteria to use to apply the > update. The second item is the biggest issue. I myself said this is not properly integrated. I do use this in an actual system, but ideally if any of this is to be welcomed, then it'd have to be properly integrated. I don't see what's wrong with the use of plpgsql as the MV system in PG uses SQL, but in a proper integration a lot of this would be re-written in C (the use of SQL for the delta computation and the updates of the history table would remain, but I think too so would the triggers needed to update the history table when the MV is updated directly). As to the second issue... Just the other day Kevin Grittner was concerned about automatic MV updates because some of them can take too long. Now, PG could timeout MV updates, roll them back, and mark an MV as requiring a refresh, but the user might still have something to say about this: they really, really might want some updates to always happen synchronously, and others always asynchronously. I think it's really necessary to give the user this sort of control. > That said, I think this would be useful to some people as an extension. I > suggest you put it on github (or equivalent) and upload it to > http://pgxn.org. Ah, I forgot to mention that it is on github here: https://github.com/twosigma/postgresql-contrib > In terms of community support, the next step is to get statement-level > support for NEW and OLD, something I think Kevin has been working on. Well, I think there's a lot to look at here. That's partly why a plpgsql-coded implementation is a good first step, IMO: it helps find issues. For me the need for a PK for MVs is fairly important; but the need for deltas/history table is critical, and there is no reason not to have it considering that deltas are produced internally during REFRESH .. CONCURRENTLY. Nico --