Re: counting algorithm for incremental matview maintenance - Mailing list pgsql-hackers

From Kevin Grittner
Subject Re: counting algorithm for incremental matview maintenance
Date
Msg-id 1368814052.93619.YahooMailNeo@web162903.mail.bf1.yahoo.com
Whole thread Raw
In response to Re: counting algorithm for incremental matview maintenance  (Amit Kapila <amit.kapila@huawei.com>)
Responses Re: counting algorithm for incremental matview maintenance  (Nicolas Barbier <nicolas.barbier@gmail.com>)
List pgsql-hackers
Amit Kapila <amit.kapila@huawei.com> wrote:
> On Wednesday, May 15, 2013 1:22 AM Kevin Grittner wrote:
>
> Good explanation for understanding the initial concept of
> incremental update of matviews.

Thanks.  This is one of those topics where it takes a lot of time
going over highly technical papers to really get your head around
it.  I'm hoping some simple examples will give a decent intuitive
grasp for those who want that but don't want to invest the time to
cover all the details.

At some point a README file will be needed.  I will probably start
with a Wiki that can evolve during development and be used to help
created the README file near the end of the release cycle.  Dan and
I did that for SSI and it seemed to work reasonably well.

>> The original and modified versions of the relations (tables or
>> other matviews) which define a matview must be available to
>> calculate the matview deltas, so the snapshot information for
>> this must be available and registered until the matview delta
>> has been calculated.  They can be released once the delta has
>> been established and before it has been applied to the matview.
>
> Here by modified versions of the relations, do you mean to say
> delta relations for recording changes.

During calculation of the deltas to apply to the matviews, it must
be possible to query the referenced tables from the perspective of
both the "before" and "after" versions of the data.

> Could you elaborate a bit about snapshot information, is this
> snapshot is for delta relation

I don't think we need to use snapshots to read the deltas -- those
contain the information about the transition from one state to
another, rather than any persistent state which would be visible to
the user.  I see them more like tuple stores created as interim
steps in query execution.

> when will it acquire snapshot information to Update matviews?

In early milestones the work will be done in the context of
completing a statement or a transaction, and the MVCC data used to
update matviews will be from that context -- much like for changes
made within a trigger.

Once we move on to asynchronous matview maintenance it seems pretty
clear that we need to have queues and background worker processes.

If *deltas for the matview changes* are derived in the originating
transaction we might want one queue of deltas for each target
matview.  I don't think the process applying the deltas needs to do
anything unusual or special about acquiring snapshots as it
processes the delta from each transaction, unless we see some way
to optimize things once we're hip-deep in the details.

We could off-load the delta calculations for the matviews from the
originating transaction, but that would require a separate set of
queues background workers, and it would require that snapshots from
the transaction remain registered so that they are valid for use by
the delta calculations.  It's not clear whether the benefit of that
would justify the extra complexity.  That would be, essentially,
another form of parallelizing the work of a connection; so I'm
leaving that as *possible* work later on, which is only likely to
be worthwhile if there is enough infrastructure from other work on
parallelization to reduce the scope here.

Let me know if that doesn't address what you wanted to know.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Liming Hu
Date:
Subject: Re: request a new feature in fuzzystrmatch
Next
From: Kevin Grittner
Date:
Subject: Re: counting algorithm for incremental matview maintenance