Re: matview niceties: pick any two of these three - Mailing list pgsql-hackers

From Kevin Grittner
Subject Re: matview niceties: pick any two of these three
Date
Msg-id 1367606094.67129.YahooMailNeo@web162901.mail.bf1.yahoo.com
Whole thread Raw
In response to Re: matview niceties: pick any two of these three  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: matview niceties: pick any two of these three  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> The current matview design gets around this problem by requiring
> that transition between scannable and unscannable states involve
> a complete table rewrite, and thus the transactionality issue can
> be hidden behind a transactional update of the matview's
> pg_class.relfilenode field.  IMO, that is obviously a dead-end
> design, because we are going to want scannability status updates
> associated with partial updates of the matview's contents.

I don't think the change between populated and non-populated needs
to change without a new heap being swapped in.  To do incremental
maintenance, one must be starting with a populated heap.  What will
change without a new heap is the contents of the matview and
whether a populated matview is "fresh" enough to be scanned for
some specific usage.  There's a lot of bikeshedding left to be done
on what options we need there and how they will be stored; but I
don't see how that threatens the ability to use a zero-length heap
to indicate that a matview has not been populated.  Scannability
and the populated state only happen to be the same thing for the
moment, but improving the scannability test does not require a new
mechanism for determining whether the matview has been populated.
A new "populated" mechanism is, we all agree, desirable; but it's
not necessary to improve scannability testing.  We'll get into all
kinds of design trouble if we conflate these two separate concepts.

> Kevin's summary is leaving out one key desirable property:
>
> (4) ability to change scannability state without a full table
> rewrite.
>
> Putting the state into pg_class would preserve that property.

I could probably add a hundred other desirable properties for a
materialized view implementation; I was trying to list the ones
which were still up-for-grabs for the initial 9.3 implementation.
If we want to make progress on catching up to other major databases
on this, it will take progress in many releases; and every one of
those will lack something that someone wants.  OK, a lot of things
that a lot of people want.  But if we can't commit something that
is less than complete or less than perfect to make incremental
change, we're not going to be getting anywhere very fast.  I'm not
sure why you would want to make a non-populated matview scannable,
anyway; which seems like the only reason to have (4) without more
refined maintenance techniques.

I plan to submit a patch to the first 9.4 CF for differential
updates (generating the new contents in a temporary heap and
transactionally applying the changes to the existing matview heap)
as a REFRESH option.  I'm working on estimates for the amount of
work needed to support delta-driven incremental changes to matview
contents based on the declaration of the matview for various types
of matviews.  On a preliminary basis, I think that we can have fast
incremental updates in 9.4 without any imperative coding for all
non-recursive views, although including support in 9.4 for
incremental updates for views with window functions is still iffy,
and some desirable optimizations for aggregates might not make it.

I'm really very sure that incremental updates for recursive
matviews will not be a 9.4 feature, since that is best handled by
the "Delete and Rederive" (DRed) algorithm, while incremental
maintenance for non-recursive matviews is best handled with the
"counting" algorithm, and it seems unlikely that we'll get both
into one major release.

If I thought that the current hack for tracking the populated state
would be permanent, or that it would be so hard to replace in a
later release that it would impede further development, I would be
the first one to rip it out.  I haven't been able to see such a
risk, and nobody has adequately explained where they think the risk
lies.  The "crash during extension of the first page" argument
doesn't hold up unless you conflate the populated state with the
scannable state, and it seems likely that the scannable will not
only be separate, but able to differ for among backend processes.

If adding unlogged materialized views requires reworking support
for unlogged relations in general in a way that puts the data into
the catalog, and that becomes my issue to solve, I can deal with
that -- but that time would come out of time I would otherwise be
spending working on supporting incremental update of certain
classes of materialized views, and progress there will be slower
than I've been hoping.  It's all about trade-offs.  If the unlogged
recovery changes wind up on my plate, for example, it might mean
that the next release won't be able to have incremental updates for
matviews with a NOT EXISTS clause, since support for that looks to
be roughly the same amount of work.

I will certainly go along with the consensus, but from my
perspective, living with the current hack for determining whether a
matview has been populated is worth the benefit of having unlogged
matviews in 9.3.  I don't expect that hack to be viable for more
that one or two releases; I just think that it's a practical matter
of working with the tools we have *now* for unlogged relations to
try to create a new type of unlogged relation.  I'm all for better
tools and will be happy to use them once they exist.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Commit subject line
Next
From: Jeff Janes
Date:
Subject: Re: pg_ctl idempotent item removed