Re: Implementing Incremental View Maintenance - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Implementing Incremental View Maintenance |
Date | |
Msg-id | CA+hUKG+2ze0_kcZt9AAUDufP0cNL+jtS5BsTmys3c-9S3Ecfng@mail.gmail.com Whole thread Raw |
In response to | Re: Implementing Incremental View Maintenance (Yugo NAGATA <nagata@sraoss.co.jp>) |
Responses |
Re: Implementing Incremental View Maintenance
|
List | pgsql-hackers |
Hi Nagata-san, On Mon, Aug 31, 2020 at 5:32 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote: > https://wiki.postgresql.org/wiki/Incremental_View_Maintenance Thanks for writing this! + /* + * Wait for concurrent transactions which update this materialized view at + * READ COMMITED. This is needed to see changes committed in other + * transactions. No wait and raise an error at REPEATABLE READ or + * SERIALIZABLE to prevent update anomalies of matviews. + * XXX: dead-lock is possible here. + */ + if (!IsolationUsesXactSnapshot()) + LockRelationOid(matviewOid, ExclusiveLock); + else if (!ConditionalLockRelationOid(matviewOid, ExclusiveLock)) Could you please say a bit more about your plans for concurrency control? Simple hand-crafted "rollup" triggers typically conflict only when modifying the same output rows due to update/insert conflicts, or perhaps some explicit row level locking if they're doing something complex (unfortunately, they also very often have concurrency bugs...). In some initial reading about MV maintenance I did today in the hope of understanding some more context for this very impressive but rather intimidating patch set, I gained the impression that aggregate-row locking granularity is assumed as a baseline for eager incremental aggregate maintenance. I understand that our MVCC/snapshot scheme introduces extra problems, but I'm wondering if these problems can be solved using the usual update semantics (the EvalPlanQual mechanism), and perhaps also some UPSERT logic. Why is it not sufficient to have locked all the base table rows that you have modified, captured the before-and-after values generated by those updates, and also locked all the IMV aggregate rows you will read, and in the process acquired a view of the latest committed state of the IMV aggregate rows you will modify (possibly having waited first)? In other words, what other data do you look at, while computing the incremental update, that might suffer from anomalies because of snapshots and concurrency? For one thing, I am aware that unique indexes for groups would probably be necessary; perhaps some subtle problems of the sort usually solved with predicate locks lurk there? (Newer papers describe locking schemes that avoid even aggregate-row level conflicts, by taking advantage of the associativity and commutativity of aggregates like SUM and COUNT. You can allow N writers to update the aggregate concurrently, and if any transaction has to roll back it subtracts what it added, not necessarily restoring the original value, so that nobody conflicts with anyone else, or something like that... Contemplating an MVCC, no-rollbacks version of that sort of thing leads to ideas like, I dunno, update chains containing differential update trees to be compacted later... egad!)
pgsql-hackers by date: