Re: refresh materialized view concurrently - Mailing list pgsql-hackers

From Kevin Grittner
Subject Re: refresh materialized view concurrently
Date
Msg-id 1371426229.13512.YahooMailNeo@web162906.mail.bf1.yahoo.com
Whole thread Raw
In response to Re: refresh materialized view concurrently  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: refresh materialized view concurrently
List pgsql-hackers
Simon Riggs <simon@2ndQuadrant.com> wrote:
> On 14 June 2013 17:05, Kevin Grittner <kgrittn@ymail.com> wrote:
>> Attached is a patch for REFRESH MATERIALIZED VIEW CONCURRENTLY
>> for 9.4 CF1.  The goal of this patch is to allow a refresh
>> without interfering with concurrent reads, using transactional
>> semantics.
>
> Is there a reason to keep the non-concurrent behaviour?

Yes.  For initial population, truncation, and replacement of
contents when more than a small percentage of rows are affected.

> Anybody that wants to explicitly lock should just run a LOCK
> statement. Can't we treat behaviour when fully locked as an
> optimisation, so we can just do the right thing without extra
> thought and keywords?

Are you suggesting that we use heap replacement or DML depending on
what heavyweight locks held when the statement is executed?

>> It is my hope to get this committed during this CF to allow me
>> to focus on incremental maintenance for the rest of the release
>> cycle.
>
> Incremental maintenance will be very straightforward using the
> logical changeset extraction code Andres is working on.

At most, changeset extraction will help with obtaining the initial
delta for the base relations, which is less than 5% of what needs
doing for incremental maintenance of materialized views.  If it
looks like a good fit, of course I'll use it.

> Having two parallel mechanisms for changeset extraction in one
> release seems like a waste of time.

I haven't looked in depth at what technique to use for capturing
the base relation deltas.  The changeset extraction technique is
something to consider, for sure.  I have a lot of work left to see
whether it works for this.  In particular, to handle all requested
timings, it would need to have low enough latency to provide a
delta during the completion of each DML statement, to support
requests for "eager" maintenance of a materialized view -- where
the transaction which just changed the base relation would see the
effect if they queried the matview.  That may not be the something
to try to tackle in this release, but there are people who want it,
and I would prefer to pick a technique which didn't have a latency
high enough to make that impractical.  That's not to say that I
know that to be a problem for using the changeset extraction
technique for this -- just that I haven't gotten to the point of
evaluating that.

> Especially when one is known to be better than the other already.

What is the hypothetical technique you're arguing is inferior?  For
my own part, I haven't gotten beyond the phase of knowing that to
meet all requests for the feature, it would need to be available at
about the same point that AFTER EACH STATEMENT triggers fire, but
that it should not involve any user-written triggers.  Have you
implemented something similar to what you think I might be
considering?  Do you have benchmark results?  Can you share
details?

> Given that we also want to do concurrent CLUSTER and ALTER TABLE
> ... SET TABLESPACE using changeset extraction I think its time
> that discussion happened on hackers.

No objections to that here; but please don't hijack this thread for
that discussion.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: minor patch submission: CREATE CAST ... AS EXPLICIT
Next
From: Stephen Frost
Date:
Subject: Re: [9.4 CF 1] Commit Fest has started