Re: Materialized views WIP patch - Mailing list pgsql-hackers
From | Nicolas Barbier |
---|---|
Subject | Re: Materialized views WIP patch |
Date | |
Msg-id | CAP-rdTZaD=qP1T0uw=S6BgG-CwTBNc5bFid7mt30x+Q0uzvf5g@mail.gmail.com Whole thread Raw |
In response to | Re: Materialized views WIP patch (Kevin Grittner <kgrittn@ymail.com>) |
Responses |
Re: Materialized views WIP patch
|
List | pgsql-hackers |
2013/3/3 Kevin Grittner <kgrittn@ymail.com>: > Nicolas Barbier <nicolas.barbier@gmail.com> wrote: > >> I think that automatically using materialized views even when the >> query doesn’t mention them directly, is akin to automatically >> using indexes without having to mention them in the query. That >> way, queries can be written the natural way, and “creating >> materialized views” is an optimization that can be applied by a >> DBA without having to update the application queries to use them. > > Oh, I understand that concept perfectly well, I just wonder how > often it is useful in practice. The cost of planning with indexes > tends to go up dramatically the planner needs to evaluate all > possible combinations of access paths. We've devoted quite a bit > of energy keeping that from being something like the factorial of > the number of indexes. If you now need to find all materialized > views which could substitute for parts of a query, and look at all > permutations of how those could be used, and which indexes can be > used for each of those combinations, you have planning time which > can explode to extreme levels. I guess that a basic version of such a feature would do something like this: (1) Check for each matview whether it simply consists of an optional bunch of joins + optional aggregation + optional general filter conditions (to support something akin to a partial index). If not, the optimization doesn’t take this matview into account. This step can be done beforehand. (2) Check for each (sub)query in the query-to-optimize whether it does the following (at a smart point in the optimization phase). If any of these conditions are not met, don’t use this matview: - Joins at least the tables that are joined in the matview. - Containsjoin conditions and general filter conditions that are at least as strict. - Doesn’t refer elsewhere to any attributes that the matview doesn’t contain. (3) Always replace the corresponding query parts with the matview (i.e., assume that the cost will always be lower than performing the original query). (4) If multiple matviews fit in step 3, try them all (and use the one that yields the lower total cost). (5) Always replace any aggregation with the corresponding aggregation-results (if they exist) from the matview. That doesn’t sound as if it would make planning time explode that much (except, because of step 4, when there are many matviews that contain overlapping sets of joined tables, and a query joins over the union of those sets, *and* the replaceable-by-matviews parts are in subqueries). It could even decrease it significantly (e.g., in the case where a bunch of joins would be replaced with a scan of a matview). Also, I suppose that once such functionality exists, application writers would be more inclined to write “heavy” queries that do lots of aggregation even in an OLTP environment, of the kind that is these days typically only done in OLAP environments. Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad?
pgsql-hackers by date: