Re: Materialized views WIP patch - Mailing list pgsql-hackers

From Nicolas Barbier
Subject Re: Materialized views WIP patch
Date
Msg-id CAP-rdTZaD=qP1T0uw=S6BgG-CwTBNc5bFid7mt30x+Q0uzvf5g@mail.gmail.com
Whole thread Raw
In response to Re: Materialized views WIP patch  (Kevin Grittner <kgrittn@ymail.com>)
Responses Re: Materialized views WIP patch  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
2013/3/3 Kevin Grittner <kgrittn@ymail.com>:

> Nicolas Barbier <nicolas.barbier@gmail.com> wrote:
>
>> I think that automatically using materialized views even when the
>> query doesn’t mention them directly, is akin to automatically
>> using indexes without having to mention them in the query. That
>> way, queries can be written the natural way, and “creating
>> materialized views” is an optimization that can be applied by a
>> DBA without having to update the application queries to use them.
>
> Oh, I understand that concept perfectly well, I just wonder how
> often it is useful in practice.  The cost of planning with indexes
> tends to go up dramatically the planner needs to evaluate all
> possible combinations of access paths.  We've devoted quite a bit
> of energy keeping that from being something like the factorial of
> the number of indexes.  If you now need to find all materialized
> views which could substitute for parts of a query, and look at all
> permutations of how those could be used, and which indexes can be
> used for each of those combinations, you have planning time which
> can explode to extreme levels.

I guess that a basic version of such a feature would do something like this:

(1) Check for each matview whether it simply consists of an optional
bunch of joins + optional aggregation + optional general filter
conditions (to support something akin to a partial index). If not, the
optimization doesn’t take this matview into account. This step can be
done beforehand.
(2) Check for each (sub)query in the query-to-optimize whether it does
the following (at a smart point in the optimization phase). If any of
these conditions are not met, don’t use this matview: - Joins at least the tables that are joined in the matview. -
Containsjoin conditions and general filter conditions that are at 
least as strict. - Doesn’t refer elsewhere to any attributes that the matview doesn’t contain.
(3) Always replace the corresponding query parts with the matview
(i.e., assume that the cost will always be lower than performing the
original query).
(4) If multiple matviews fit in step 3, try them all (and use the one
that yields the lower total cost).
(5) Always replace any aggregation with the corresponding
aggregation-results (if they exist) from the matview.

That doesn’t sound as if it would make planning time explode that much
(except, because of step 4, when there are many matviews that contain
overlapping sets of joined tables, and a query joins over the union of
those sets, *and* the replaceable-by-matviews parts are in
subqueries). It could even decrease it significantly (e.g., in the
case where a bunch of joins would be replaced with a scan of a
matview).

Also, I suppose that once such functionality exists, application
writers would be more inclined to write “heavy” queries that do lots
of aggregation even in an OLTP environment, of the kind that is these
days typically only done in OLAP environments.

Nicolas

--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?



pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: Materialized views WIP patch
Next
From: Greg Smith
Date:
Subject: Re: Enabling Checksums