Re: [v9.2] Fix Leaky View Problem - Mailing list pgsql-hackers
From | Kohei KaiGai |
---|---|
Subject | Re: [v9.2] Fix Leaky View Problem |
Date | |
Msg-id | CADyhKSV8Xmp2ZmQq7Hj6M_7tQ2436QsLYWRb=ReMkLSbjatqFQ@mail.gmail.com Whole thread Raw |
In response to | Re: [v9.2] Fix Leaky View Problem (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [v9.2] Fix Leaky View Problem
|
List | pgsql-hackers |
Hi Robert, I'm a bit confusing about this sentence. > If you can make this work, I think it could be a pretty sweet plannner > optimization even apart from the implications for security views. > Consider a query of this form: > > A LEFT JOIN B LEFT JOIN C > > where B is a view defined as: > > B1 JOIN B2 JOIN B3 LEFT JOIN B4 LEFT JOIN B5 > > Now let's suppose that from_collapse_limit/join_collapse_limit are set > low enough that we decline to fold these subproblems together. If > there happens to be a qual B.x = 1, where B.x is really B1.x, then the > generated plan sucks, because it will basically lose the ability to > filter B1 early, very possibly on, say, a unique index. Or at least a > highly selective index. > I tried to reproduce the scenario with enough small from/join_collapse_limit (typically 1), but it allows to push down qualifiers into the least scan plan. E.g) mytest=# SET from_collapse_limit = 1; mytest=# SET join_collapse_limit = 1; mytest=# CREATE VIEW B AS SELECT B1.* FROM B1,B2,B3 WHERE B1.x = B2.x AND B2.x = B3.x; mytest=# EXPLAIN SELECT * FROM A,B,C WHERE A.x=B.x AND B.x=C.x AND f_leak(B.y); QUERYPLAN ------------------------------------------------------------------------------------Merge Join (cost=381.80..9597.97 rows=586624width=108) Merge Cond: (a.x = b1.x) -> Merge Join (cost=170.85..290.46 rows=7564 width=72) Merge Cond:(a.x = c.x) -> Sort (cost=85.43..88.50 rows=1230 width=36) Sort Key: a.x -> SeqScan on a (cost=0.00..22.30 rows=1230 width=36) -> Sort (cost=85.43..88.50 rows=1230 width=36) Sort Key: c.x -> Seq Scan on c (cost=0.00..22.30 rows=1230 width=36) -> Materialize (cost=210.95..528.56rows=15510 width=44) -> Merge Join (cost=210.95..489.78 rows=15510 width=44) MergeCond: (b1.x = b3.x) -> Merge Join (cost=125.52..165.40 rows=2522 width=40) Merge Cond:(b1.x = b2.x) -> Sort (cost=40.09..41.12 rows=410 width=36) Sort Key:b1.x -> Seq Scan on b1 (cost=0.00..22.30 rows=410 width=36) Filter: f_leak(y) -> Sort (cost=85.43..88.50 rows=1230width=4) Sort Key: b2.x -> Seq Scan on b2 (cost=0.00..22.30 rows=1230 width=4) -> Sort (cost=85.43..88.50 rows=1230 width=4) Sort Key: b3.x -> Seq Scan on b3 (cost=0.00..22.30 rows=1230 width=4) (25 rows) In this example, f_leak() takes an argument come from B1 table within B view, and it was correctly distributed to SeqScan on B1. From perspective of the code, the *_collapse_limit affects the contents of joinlist being returned from deconstruct_jointree() whether its sub-portion is flatten, or not. However, the qualifiers are distributed on distribute_restrictinfo_to_rels() to RelOptInfo based on its dependency of relations being referenced by arguments. Thus, the above f_leak() was distributed to B1, not B, because its arguments come from only B1. I agree with the following approach to tackle this problem in 100%. However, I'm unclear how from/join_collapse_limit affects to keep sub-queries unflatten. It seems to me it is determined based on the result of is_simple_subquery(). > 1. Let quals percolate down into subqueries. > 2. Add the notion of a security view, which prevents flattening and > disables the optimization of patch #1 > 3. Add the notion of a leakproof function, which can benefit from the > optimization of #1 even when the view involved is a security view as > introduced in #2 > Thanks, 2011/10/11 Robert Haas <robertmhaas@gmail.com>: > On Mon, Oct 10, 2011 at 4:28 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote: >> I agreed. We have been on the standpoint that tries to prevent >> leakable functions to reference a portion of join-tree being already >> flatten, however, it has been a tough work. >> It seems to me it is much simple approach that enables to push >> down only non-leaky functions into inside of sub-queries. >> >> An idea is to add a hack on distribute_qual_to_rels() to relocate >> a qualifier into inside of the sub-query, when it references only >> a particular sub-query being come from a security view, and >> when the sub-query satisfies is_simple_subquery(), for example. > > If you can make this work, I think it could be a pretty sweet plannner > optimization even apart from the implications for security views. > Consider a query of this form: > > A LEFT JOIN B LEFT JOIN C > > where B is a view defined as: > > B1 JOIN B2 JOIN B3 LEFT JOIN B4 LEFT JOIN B5 > > Now let's suppose that from_collapse_limit/join_collapse_limit are set > low enough that we decline to fold these subproblems together. If > there happens to be a qual B.x = 1, where B.x is really B1.x, then the > generated plan sucks, because it will basically lose the ability to > filter B1 early, very possibly on, say, a unique index. Or at least a > highly selective index. If we could allow the B.x qual to trickle > down inside of the subquery, we'd get a much better plan. Of course, > it's still not as good as flattening, because it won't allow us to > consider as many possible join orders - but the whole point of having > from_collapse_limit/join_collapse_limit in the first place is that we > can't consider all the join orders without having planning time and > memory usage balloon wildly out of control. And in many real-world > cases, I think that this would probably mitigate the effects of > exceeding from_collapse_limit/join_collapse_limit quite a bit. > > In order to make it work, though, you'd need to arrange things so that > we distribute quals to rels in the parent query, then let some of them > filter down into the subquery, then distribute quals to rels in the > subquery (possibly adjusting RTE indexes?), then finish planning the > subquery, then finish planning the parent query. Not sure how > possible/straightforward that is. > > It's probably a good idea to deal with this part first, because if you > can't make it work then the whole approach is in trouble. I'm almost > imagining that we could break this into three independent patches, > like this: > > 1. Let quals percolate down into subqueries. > 2. Add the notion of a security view, which prevents flattening and > disables the optimization of patch #1 > 3. Add the notion of a leakproof function, which can benefit from the > optimization of #1 even when the view involved is a security view as > introduced in #2 > > Unlike the way you have it now, I think those patches could be > independently committable. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > -- KaiGai Kohei <kaigai@kaigai.gr.jp>
pgsql-hackers by date: