Re: [v9.2] Fix Leaky View Problem - Mailing list pgsql-hackers

From Kohei KaiGai
Subject Re: [v9.2] Fix Leaky View Problem
Date
Msg-id CADyhKSV8Xmp2ZmQq7Hj6M_7tQ2436QsLYWRb=ReMkLSbjatqFQ@mail.gmail.com
Whole thread Raw
In response to Re: [v9.2] Fix Leaky View Problem  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [v9.2] Fix Leaky View Problem
List pgsql-hackers
Hi Robert,

I'm a bit confusing about this sentence.

> If you can make this work, I think it could be a pretty sweet plannner
> optimization even apart from the implications for security views.
> Consider a query of this form:
>
> A LEFT JOIN B LEFT JOIN C
>
> where B is a view defined as:
>
> B1 JOIN B2 JOIN B3 LEFT JOIN B4 LEFT JOIN B5
>
> Now let's suppose that from_collapse_limit/join_collapse_limit are set
> low enough that we decline to fold these subproblems together.  If
> there happens to be a qual B.x = 1, where B.x is really B1.x, then the
> generated plan sucks, because it will basically lose the ability to
> filter B1 early, very possibly on, say, a unique index.  Or at least a
> highly selective index.
>

I tried to reproduce the scenario with enough small from/join_collapse_limit
(typically 1), but it allows to push down qualifiers into the least scan plan.

E.g)
mytest=# SET from_collapse_limit = 1;
mytest=# SET join_collapse_limit = 1;
mytest=# CREATE VIEW B AS SELECT B1.* FROM B1,B2,B3 WHERE B1.x = B2.x
AND B2.x = B3.x;
mytest=# EXPLAIN SELECT * FROM A,B,C WHERE A.x=B.x AND B.x=C.x AND f_leak(B.y);
QUERYPLAN 
------------------------------------------------------------------------------------Merge Join  (cost=381.80..9597.97
rows=586624width=108)  Merge Cond: (a.x = b1.x)  ->  Merge Join  (cost=170.85..290.46 rows=7564 width=72)        Merge
Cond:(a.x = c.x)        ->  Sort  (cost=85.43..88.50 rows=1230 width=36)              Sort Key: a.x              ->
SeqScan on a  (cost=0.00..22.30 rows=1230 width=36)        ->  Sort  (cost=85.43..88.50 rows=1230 width=36)
Sort Key: c.x              ->  Seq Scan on c  (cost=0.00..22.30 rows=1230 width=36)  ->  Materialize
(cost=210.95..528.56rows=15510 width=44)        ->  Merge Join  (cost=210.95..489.78 rows=15510 width=44)
MergeCond: (b1.x = b3.x)              ->  Merge Join  (cost=125.52..165.40 rows=2522 width=40)                    Merge
Cond:(b1.x = b2.x)                    ->  Sort  (cost=40.09..41.12 rows=410 width=36)                          Sort
Key:b1.x                          ->  Seq Scan on b1  (cost=0.00..22.30 
rows=410 width=36)                                Filter: f_leak(y)                    ->  Sort  (cost=85.43..88.50
rows=1230width=4)                          Sort Key: b2.x                          ->  Seq Scan on b2
(cost=0.00..22.30
rows=1230 width=4)              ->  Sort  (cost=85.43..88.50 rows=1230 width=4)                    Sort Key: b3.x
            ->  Seq Scan on b3  (cost=0.00..22.30 rows=1230 width=4) 
(25 rows)

In this example, f_leak() takes an argument come from B1 table within B view,
and it was correctly distributed to SeqScan on B1.

From perspective of the code, the *_collapse_limit affects the contents of
joinlist being returned from deconstruct_jointree() whether its sub-portion is
flatten, or not.
However, the qualifiers are distributed on distribute_restrictinfo_to_rels() to
RelOptInfo based on its dependency of relations being referenced by
arguments. Thus, the above f_leak() was distributed to B1, not B, because
its arguments come from only B1.


I agree with the following approach to tackle this problem in 100%.
However, I'm unclear how from/join_collapse_limit affects to keep
sub-queries unflatten. It seems to me it is determined based on
the result of is_simple_subquery().

> 1. Let quals percolate down into subqueries.
> 2. Add the notion of a security view, which prevents flattening and
> disables the optimization of patch #1
> 3. Add the notion of a leakproof function, which can benefit from the
> optimization of #1 even when the view involved is a security view as
> introduced in #2
>

Thanks,

2011/10/11 Robert Haas <robertmhaas@gmail.com>:
> On Mon, Oct 10, 2011 at 4:28 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
>> I agreed. We have been on the standpoint that tries to prevent
>> leakable functions to reference a portion of join-tree being already
>> flatten, however, it has been a tough work.
>> It seems to me it is much simple approach that enables to push
>> down only non-leaky functions into inside of sub-queries.
>>
>> An idea is to add a hack on distribute_qual_to_rels() to relocate
>> a qualifier into inside of the sub-query, when it references only
>> a particular sub-query being come from a security view, and
>> when the sub-query satisfies is_simple_subquery(), for example.
>
> If you can make this work, I think it could be a pretty sweet plannner
> optimization even apart from the implications for security views.
> Consider a query of this form:
>
> A LEFT JOIN B LEFT JOIN C
>
> where B is a view defined as:
>
> B1 JOIN B2 JOIN B3 LEFT JOIN B4 LEFT JOIN B5
>
> Now let's suppose that from_collapse_limit/join_collapse_limit are set
> low enough that we decline to fold these subproblems together.  If
> there happens to be a qual B.x = 1, where B.x is really B1.x, then the
> generated plan sucks, because it will basically lose the ability to
> filter B1 early, very possibly on, say, a unique index.  Or at least a
> highly selective index.  If we could allow the B.x qual to trickle
> down inside of the subquery, we'd get a much better plan.  Of course,
> it's still not as good as flattening, because it won't allow us to
> consider as many possible join orders - but the whole point of having
> from_collapse_limit/join_collapse_limit in the first place is that we
> can't consider all the join orders without having planning time and
> memory usage balloon wildly out of control.  And in many real-world
> cases, I think that this would probably mitigate the effects of
> exceeding from_collapse_limit/join_collapse_limit quite a bit.
>
> In order to make it work, though, you'd need to arrange things so that
> we distribute quals to rels in the parent query, then let some of them
> filter down into the subquery, then distribute quals to rels in the
> subquery (possibly adjusting RTE indexes?), then finish planning the
> subquery, then finish planning the parent query.  Not sure how
> possible/straightforward that is.
>
> It's probably a good idea to deal with this part first, because if you
> can't make it work then the whole approach is in trouble.  I'm almost
> imagining that we could break this into three independent patches,
> like this:
>
> 1. Let quals percolate down into subqueries.
> 2. Add the notion of a security view, which prevents flattening and
> disables the optimization of patch #1
> 3. Add the notion of a leakproof function, which can benefit from the
> optimization of #1 even when the view involved is a security view as
> introduced in #2
>
> Unlike the way you have it now, I think those patches could be
> independently committable.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



--
KaiGai Kohei <kaigai@kaigai.gr.jp>


pgsql-hackers by date:

Previous
From: Chris Redekop
Date:
Subject: Re: Hot Backup with rsync fails at pg_clog if under load
Next
From: Tom Lane
Date:
Subject: Re: Pushing ScalarArrayOpExpr support into the btree index AM