Thread: how to control the execution plan ?

how to control the execution plan ?

From

"Sabin Coanda"

Date:

07 July 2008, 06:40:36

Hi there,

I try to execute the following statement:

SELECT *
FROM (   SELECT "MY_FUNCTION_A"(bp."COL_A", NULL::boolean) AS ALIAS_A   FROM "TABLE_A" bp       JOIN "TABLE_B" pn ON
bp."COL_B"= pn."PK_ID"       JOIN "TABLE_C" vbo ON bp."COL_C" = vbo."PK_ID"   WHERE pn."Editor"::text ~~ 'Some%'::text
ANDbp."COL_A" IS NOT NULL AND 
 
bp."COL_A"::text <> ''::text
) x
WHERE (x.ALIAS_A::text ) IS NULL;

The problem is the excution plan first make Seq Scan on "TABLE_A", with 
Filter: (("COL_A" IS NOT NULL) AND (("COL_A")::text <> ''::text) AND 
(("MY_FUNCTION_A"("COL_A", NULL::boolean))::text IS NULL))". This way, 
MY_FUNCTION_A crashes for some unsupported data provided by  "COL_A".

I'd like to get an execution plan which is filtering first the desired rows, 
and just after that compute te column value "MY_FUNCTION_A"(bp."COL_A", 
NULL::boolean).

I made different combinations, including a subquery like:

SELECT *
FROM (   SELECT "MY_FUNCTION_A"(y."COL_A", NULL::boolean) AS ALIAS_A   FROM (       SELECT bp."COL_A"       FROM
"TABLE_A"bp           JOIN "TABLE_B" pn ON bp."COL_B" = pn."PK_ID"           JOIN "TABLE_C" vbo ON bp."COL_C" =
vbo."PK_ID"      WHERE pn."COL_E"::text ~~ 'Some%'::text AND bp."COL_A" IS NOT NULL 
 
AND bp."COL_A"::text <> ''::text   ) y
) x
WHERE (x.ALIAS_A::text ) IS NULL;

but postgres analyze is too 'smart' and optimize it as in the previous case, 
with the same Seq Scan on "TABLE_A", and with the same filter.

I thought to change the function MY_FUNCTION_A, to support any argument 
data, but the even that another performance problem will be rised when the 
function will be computed for any row in join, even those that can be 
removed by other filter.

Do you have a solution please ?

TIA
Sabin

Re: how to control the execution plan ?

From

"Scott Marlowe"

Date:

07 July 2008, 18:14:10

On Mon, Jul 7, 2008 at 3:14 AM, Sabin Coanda <sabin.coanda@deuromedia.ro> wrote:
> Hi there,
>
> I try to execute the following statement:
>
> SELECT *
> FROM (
>    SELECT "MY_FUNCTION_A"(bp."COL_A", NULL::boolean) AS ALIAS_A
>    FROM "TABLE_A" bp
>        JOIN "TABLE_B" pn ON bp."COL_B" = pn."PK_ID"
>        JOIN "TABLE_C" vbo ON bp."COL_C" = vbo."PK_ID"
>    WHERE pn."Editor"::text ~~ 'Some%'::text AND bp."COL_A" IS NOT NULL AND
> bp."COL_A"::text <> ''::text
> ) x
> WHERE (x.ALIAS_A::text ) IS NULL;
>
> The problem is the excution plan first make Seq Scan on "TABLE_A", with
> Filter: (("COL_A" IS NOT NULL) AND (("COL_A")::text <> ''::text) AND
> (("MY_FUNCTION_A"("COL_A", NULL::boolean))::text IS NULL))". This way,
> MY_FUNCTION_A crashes for some unsupported data provided by  "COL_A".
>
> I'd like to get an execution plan which is filtering first the desired rows,
> and just after that compute te column value "MY_FUNCTION_A"(bp."COL_A",
> NULL::boolean).

Just wondering what the query plans look like here, both regular
explain, and if you can wait for it to execute, explain analyze.

I'm guessing that the function is not indexed / indexable.  Is it
marked immutable (and is it actually immutable) or stable (and is
stable)?

If it's immutable then you can create an index on it and that should
speed things up.

>
> I made different combinations, including a subquery like:
>
> SELECT *
> FROM (
>    SELECT "MY_FUNCTION_A"(y."COL_A", NULL::boolean) AS ALIAS_A
>    FROM (
>        SELECT bp."COL_A"
>        FROM "TABLE_A" bp
>            JOIN "TABLE_B" pn ON bp."COL_B" = pn."PK_ID"
>            JOIN "TABLE_C" vbo ON bp."COL_C" = vbo."PK_ID"
>        WHERE pn."COL_E"::text ~~ 'Some%'::text AND bp."COL_A" IS NOT NULL
> AND bp."COL_A"::text <> ''::text
>    ) y
> ) x
> WHERE (x.ALIAS_A::text ) IS NULL;
>
> but postgres analyze is too 'smart' and optimize it as in the previous case,
> with the same Seq Scan on "TABLE_A", and with the same filter.
>
> I thought to change the function MY_FUNCTION_A, to support any argument
> data, but the even that another performance problem will be rised when the
> function will be computed for any row in join, even those that can be
> removed by other filter.
>
> Do you have a solution please ?

If it's still to smart, you can run two queries, one to pull the set
you want to work with from the custom function into a temp table, then
analyze it, then run the query against that.
Not an optimal solution, but it might be the fastest if you can't
index your function.

Re: how to control the execution plan ?

From

"Sabin Coanda"

Date:

08 July 2008, 13:40:50

Hi Scott,

I add the answers below.

>
> Just wondering what the query plans look like here, both regular
> explain, and if you can wait for it to execute, explain analyze.
>

Just with explain, because the function craches when it is running:

"Merge Join  (cost=141.41..188.32 rows=1 width=24)"
"  Merge Cond: ("TABLE_C"."PK_ID" = bp."COL_C")"
"  ->  Merge Left Join  (cost=62.33..96.69 rows=1000 width=44)"
"        Merge Cond: (s."PK_ID" = "MY_FUNCTION_B".COL_D)"
"        ->  Index Scan using "TABLE_D_pkey" on "TABLE_D" s 
(cost=0.00..18.49 rows=349 width=4)"
"        ->  Sort  (cost=62.33..64.83 rows=1000 width=44)"
"              Sort Key: "MY_FUNCTION_B".COL_D"
"              ->  Function Scan on "MY_FUNCTION_B"  (cost=0.00..12.50 
rows=1000 width=44)"
"  ->  Sort  (cost=79.08..79.09 rows=1 width=28)"
"        Sort Key: bp."COL_C""
"        ->  Hash Join  (cost=10.59..79.07 rows=1 width=28)"
"              Hash Cond: (bp."COL_B" = pn."PK_ID")"
"              ->  Seq Scan on "TABLE_A" bp  (cost=0.00..68.46 rows=4 
width=32)"
"                    Filter: (("COL_A" IS NOT NULL) AND (("COL_A")::text <> 
''::text) AND ((("MY_FUNCTION_A"("COL_A", NULL::boolean))::text || ' 
'::text) IS NULL))"
"              ->  Hash  (cost=10.50..10.50 rows=7 width=4)"
"                    ->  Seq Scan on "TABLE_B" pn  (cost=0.00..10.50 rows=7 
width=4)"
"                          Filter: (("COL_E")::text ~~ 'Some%'::text)"

> I'm guessing that the function is not indexed / indexable.  Is it
> marked immutable (and is it actually immutable) or stable (and is
> stable)?
>

The function is marked stable.

> If it's still to smart, you can run two queries, one to pull the set
> you want to work with from the custom function into a temp table, then
> analyze it, then run the query against that.
> Not an optimal solution, but it might be the fastest if you can't
> index your function.
>

In fact I would use that statement to define a permanent view, not in a 
procedure.

Finally I found a trick specifying not just WHERE (x.ALIAS_A::text ) IS 
NULL;, but combining with a constant and a join with a constant.
By the way, it doesn't works just with the constant or with the join :(
See the query and the plan below:

EXPLAIN SELECT *
FROM (       SELECT "MY_FUNCTION_A"(bp."COL_A", NULL::boolean) AS ALIAS_A, MY_AUX       FROM "TABLE_A" bp
CROSSJOIN (               SELECT '*'::character varying AS MY_AUX           ) afp               JOIN "TABLE_B" pn ON
bp."COL_B"= pn."PK_ID"               JOIN "TABLE_C" vbo ON bp."COL_C" = vbo."PK_ID"       WHERE pn."COL_E"::text ~~
'Some%'::textAND bp."COL_A" IS NOT NULL 
 
AND bp."COL_A"::text <> ''::text
) x
WHERE (x.ALIAS_A::text || ' ' || MY_AUX) IS NULL;


"Merge Join  (cost=131.68..178.60 rows=1 width=56)"
"  Merge Cond: ("TABLE_C"."PK_ID" = bp."COL_C")"
"  ->  Merge Left Join  (cost=62.33..96.69 rows=1000 width=44)"
"        Merge Cond: (s."PK_ID" = "MY_FUNCTION_B".COL_D)"
"        ->  Index Scan using "TABLE_D_pkey" on "TABLE_D" s 
(cost=0.00..18.49 rows=349 width=4)"
"        ->  Sort  (cost=62.33..64.83 rows=1000 width=44)"
"              Sort Key: "MY_FUNCTION_B".COL_D"
"              ->  Function Scan on "MY_FUNCTION_B"  (cost=0.00..12.50 
rows=1000 width=44)"
"  ->  Sort  (cost=69.36..69.36 rows=1 width=60)"
"        Sort Key: bp."COL_C""
"        ->  Nested Loop  (cost=10.59..69.34 rows=1 width=60)"
"              Join Filter: (((("MY_FUNCTION_A"(bp."COL_A", 
NULL::boolean))::text || ' '::text) || (afp.MY_AUX)::text) IS NULL)"
"              ->  Result  (cost=0.00..0.01 rows=1 width=0)"
"              ->  Hash Join  (cost=10.59..68.94 rows=22 width=28)"
"                    Hash Cond: (bp."COL_B" = pn."PK_ID")"
"                    ->  Seq Scan on "TABLE_A" bp  (cost=0.00..54.90 
rows=862 width=32)"
"                          Filter: (("COL_A" IS NOT NULL) AND 
(("COL_A")::text <> ''::text))"
"                    ->  Hash  (cost=10.50..10.50 rows=7 width=4)"
"                          ->  Seq Scan on "TABLE_B" pn  (cost=0.00..10.50 
rows=7 width=4)"
"                                Filter: (("COL_E")::text ~~ 'Some%'::text)"

However I'm not sure there are no circumstances when the execution plan will 
detect my trick and will optimize the query again :((

Sabin