Re: Materialized views WIP patch - Mailing list pgsql-hackers

From Kevin Grittner
Subject Re: Materialized views WIP patch
Date
Msg-id 1362485755.5466.YahooMailNeo@web162902.mail.bf1.yahoo.com
Whole thread Raw
In response to Re: Materialized views WIP patch  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Materialized views WIP patch
Re: Materialized views WIP patch
List pgsql-hackers
Simon Riggs <simon@2ndQuadrant.com> wrote:
> On 3 March 2013 23:39, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Nicolas Barbier <nicolas.barbier@gmail.com> writes:
>>> 2013/3/3 Kevin Grittner <kgrittn@ymail.com>:
>>>> Nicolas Barbier <nicolas.barbier@gmail.com> wrote:
>>>>> I think that automatically using materialized views even when
>>>>> the query doesn’t mention them directly, is akin to
>>>>> automatically using indexes without having to mention them in
>>>>> the query.
>>
>>>> Oh, I understand that concept perfectly well, I just wonder
>>>> how often it is useful in practice.
>>
>> There's a much more fundamental reason why this will never
>> happen, which is that the query planner is not licensed to
>> decide that you only want an approximate and not an exact answer
>> to your query.
>>
>> If MVs were guaranteed always up-to-date, maybe we could think
>> about automatic use of them --- but that's a far different
>> feature from what Kevin has here.
>
> Its not a different feature, its what most people expect a
> feature called MV to deliver. That's not a matter of opinion, its
> simply how every other database works currently - Oracle,
> Teradata, SQLServer at least. The fact that we don't allow MVs to
> automatically optimize queries is acceptable, as long as that is
> clearly marked in some way to avoid confusion, and I don't mean
> buried on p5 of the docs. What we have here is a partial
> implementation that can be improved upon over next few releases.
> I hope anyone isn't going to claim that "Materialized Views" have
> been implemented in the release notes in this release, because
> unqualified that would be seriously misleading and might even
> stifle further funding to improve things to the level already
> implemented elsewhere. Just to reiterate, I fully support the
> committing of this partial feature into Postgres in this release
> because it will be a long haul to complete the full feature and
> what we have here is a reasonable stepping stone to get there.
>
> Transactionally up-yo-date MVs can be used like indexes in the
> planner. The idea that this is impossible because of the
> permutations involved is somewhat ridiculous; there is much
> published work on optimising that and some obvious optimisations.
> Clearly that varies according to the number of MVs and the number
> of tables they touch, not the overall complexity of the query.
> The overhead is probably same or less as partial indexes, which
> we currently think is acceptable. In any case, if you don't wish
> that overhead, don't use MVs.
>
> Non-transactionally up-to-date MVs could also be used like
> indexes in the planner, if we gave the planner the "licence" it
> (clearly) lacks. If using MV makes a two-hour query return in 1
> minute, then using an MV that is 15 minutes out of date is likely
> to be a win. The "licence" is some kind of user parameter/option
> that specifies how stale an answer a query can return. For many
> queries that involve averages and sums, a stale or perhaps an
> approximate answer would hardly differ anyway. So I think there
> is room somewhere there for a "staleness" time specification by
> the user, allowing approximation.

I don't think I disagree with any of what Simon says other than his
feelings about the planning cost.  Imagine that there are ten MVs
that might apply to a complex query, but some of them are mutually
exclusive, so there are a large number of permutations of MVs which
could be used to replace parts of the original query.  And maybe
some of base tables have indexes which could reduce execution cost
which aren't present in some or all of the MVs.  And each MV has a
number of indexes.  The combinatorial explosion of possible plans
would make it hard to constrain plan time without resorting to some
crude rules about what to choose.  That's not an unsolvable
problem, but I see it as a pretty big problem.

I have no doubt that someone could take a big data warehouse and
add one or two MVs and show a dramatic improvement in the run time
of a query.  Almost as big as if the query were rewritten to usee
the MV directly.  It would make for a very nice presentation, and
as long as they are used sparingly this could be a useful tool for
a data warehouse environment which is playing with alternative ways
to optimize slow queries which pass a lot of data.  In other
environments, I feel that it gets a lot harder to show a big win.

The good news is that it sounds like we agree on the ideal
long-term feature set.  I'm just a lot more excited, based on the
use-cases I've seen, about the addition of incremental updates than
substituting MVs into query plans which reference the underlying
tables.  Perhaps that indicates a chance to the final feature set
sooner, through everyone scratching their own itches.  :-)

And we both seem to feel that some system for managing acceptable
levels of MV "freshness" is a necessary feature in order to go very
much further.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: KONDO Mitsumasa
Date:
Subject: Re: 9.2.3 crashes during archive recovery
Next
From: Joachim Wieland
Date:
Subject: Re: parallel pg_dump