Re: [v9.2] Fix Leaky View Problem - Mailing list pgsql-hackers
| From | Kohei KaiGai |
|---|---|
| Subject | Re: [v9.2] Fix Leaky View Problem |
| Date | |
| Msg-id | CADyhKSV8Xmp2ZmQq7Hj6M_7tQ2436QsLYWRb=ReMkLSbjatqFQ@mail.gmail.com Whole thread Raw |
| In response to | Re: [v9.2] Fix Leaky View Problem (Robert Haas <robertmhaas@gmail.com>) |
| Responses |
Re: [v9.2] Fix Leaky View Problem
|
| List | pgsql-hackers |
Hi Robert,
I'm a bit confusing about this sentence.
> If you can make this work, I think it could be a pretty sweet plannner
> optimization even apart from the implications for security views.
> Consider a query of this form:
>
> A LEFT JOIN B LEFT JOIN C
>
> where B is a view defined as:
>
> B1 JOIN B2 JOIN B3 LEFT JOIN B4 LEFT JOIN B5
>
> Now let's suppose that from_collapse_limit/join_collapse_limit are set
> low enough that we decline to fold these subproblems together. If
> there happens to be a qual B.x = 1, where B.x is really B1.x, then the
> generated plan sucks, because it will basically lose the ability to
> filter B1 early, very possibly on, say, a unique index. Or at least a
> highly selective index.
>
I tried to reproduce the scenario with enough small from/join_collapse_limit
(typically 1), but it allows to push down qualifiers into the least scan plan.
E.g)
mytest=# SET from_collapse_limit = 1;
mytest=# SET join_collapse_limit = 1;
mytest=# CREATE VIEW B AS SELECT B1.* FROM B1,B2,B3 WHERE B1.x = B2.x
AND B2.x = B3.x;
mytest=# EXPLAIN SELECT * FROM A,B,C WHERE A.x=B.x AND B.x=C.x AND f_leak(B.y);
QUERYPLAN
------------------------------------------------------------------------------------Merge Join (cost=381.80..9597.97
rows=586624width=108) Merge Cond: (a.x = b1.x) -> Merge Join (cost=170.85..290.46 rows=7564 width=72) Merge
Cond:(a.x = c.x) -> Sort (cost=85.43..88.50 rows=1230 width=36) Sort Key: a.x ->
SeqScan on a (cost=0.00..22.30 rows=1230 width=36) -> Sort (cost=85.43..88.50 rows=1230 width=36)
Sort Key: c.x -> Seq Scan on c (cost=0.00..22.30 rows=1230 width=36) -> Materialize
(cost=210.95..528.56rows=15510 width=44) -> Merge Join (cost=210.95..489.78 rows=15510 width=44)
MergeCond: (b1.x = b3.x) -> Merge Join (cost=125.52..165.40 rows=2522 width=40) Merge
Cond:(b1.x = b2.x) -> Sort (cost=40.09..41.12 rows=410 width=36) Sort
Key:b1.x -> Seq Scan on b1 (cost=0.00..22.30
rows=410 width=36) Filter: f_leak(y) -> Sort (cost=85.43..88.50
rows=1230width=4) Sort Key: b2.x -> Seq Scan on b2
(cost=0.00..22.30
rows=1230 width=4) -> Sort (cost=85.43..88.50 rows=1230 width=4) Sort Key: b3.x
-> Seq Scan on b3 (cost=0.00..22.30 rows=1230 width=4)
(25 rows)
In this example, f_leak() takes an argument come from B1 table within B view,
and it was correctly distributed to SeqScan on B1.
From perspective of the code, the *_collapse_limit affects the contents of
joinlist being returned from deconstruct_jointree() whether its sub-portion is
flatten, or not.
However, the qualifiers are distributed on distribute_restrictinfo_to_rels() to
RelOptInfo based on its dependency of relations being referenced by
arguments. Thus, the above f_leak() was distributed to B1, not B, because
its arguments come from only B1.
I agree with the following approach to tackle this problem in 100%.
However, I'm unclear how from/join_collapse_limit affects to keep
sub-queries unflatten. It seems to me it is determined based on
the result of is_simple_subquery().
> 1. Let quals percolate down into subqueries.
> 2. Add the notion of a security view, which prevents flattening and
> disables the optimization of patch #1
> 3. Add the notion of a leakproof function, which can benefit from the
> optimization of #1 even when the view involved is a security view as
> introduced in #2
>
Thanks,
2011/10/11 Robert Haas <robertmhaas@gmail.com>:
> On Mon, Oct 10, 2011 at 4:28 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
>> I agreed. We have been on the standpoint that tries to prevent
>> leakable functions to reference a portion of join-tree being already
>> flatten, however, it has been a tough work.
>> It seems to me it is much simple approach that enables to push
>> down only non-leaky functions into inside of sub-queries.
>>
>> An idea is to add a hack on distribute_qual_to_rels() to relocate
>> a qualifier into inside of the sub-query, when it references only
>> a particular sub-query being come from a security view, and
>> when the sub-query satisfies is_simple_subquery(), for example.
>
> If you can make this work, I think it could be a pretty sweet plannner
> optimization even apart from the implications for security views.
> Consider a query of this form:
>
> A LEFT JOIN B LEFT JOIN C
>
> where B is a view defined as:
>
> B1 JOIN B2 JOIN B3 LEFT JOIN B4 LEFT JOIN B5
>
> Now let's suppose that from_collapse_limit/join_collapse_limit are set
> low enough that we decline to fold these subproblems together. If
> there happens to be a qual B.x = 1, where B.x is really B1.x, then the
> generated plan sucks, because it will basically lose the ability to
> filter B1 early, very possibly on, say, a unique index. Or at least a
> highly selective index. If we could allow the B.x qual to trickle
> down inside of the subquery, we'd get a much better plan. Of course,
> it's still not as good as flattening, because it won't allow us to
> consider as many possible join orders - but the whole point of having
> from_collapse_limit/join_collapse_limit in the first place is that we
> can't consider all the join orders without having planning time and
> memory usage balloon wildly out of control. And in many real-world
> cases, I think that this would probably mitigate the effects of
> exceeding from_collapse_limit/join_collapse_limit quite a bit.
>
> In order to make it work, though, you'd need to arrange things so that
> we distribute quals to rels in the parent query, then let some of them
> filter down into the subquery, then distribute quals to rels in the
> subquery (possibly adjusting RTE indexes?), then finish planning the
> subquery, then finish planning the parent query. Not sure how
> possible/straightforward that is.
>
> It's probably a good idea to deal with this part first, because if you
> can't make it work then the whole approach is in trouble. I'm almost
> imagining that we could break this into three independent patches,
> like this:
>
> 1. Let quals percolate down into subqueries.
> 2. Add the notion of a security view, which prevents flattening and
> disables the optimization of patch #1
> 3. Add the notion of a leakproof function, which can benefit from the
> optimization of #1 even when the view involved is a security view as
> introduced in #2
>
> Unlike the way you have it now, I think those patches could be
> independently committable.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
pgsql-hackers by date: