Thread: Expression Pruning in postgress

Expression Pruning in postgress

From

HarmeekSingh Bedi

Date:

07 July 2011, 07:18:06

Hi .
 Apology for the bandwith. Did not know who to ask the question . I
was interested in a brief detail on how postgress does expression
pruning between nodes . The basic question is as follows

Scenerio
  In a plan where Node 1 is parent {say join) and Node 2 is child
(say scan)  . If node 1 has a expression say foo(column) then scan
will project 'column'  for sure and join will     evaluate foo(column).  Now if the node above join does not need
column ? how does postgress remove the column from join's projection
list . I am seeing that it does not in many     cases specially when sort appears above.
Question    In general is there something in the code that can process a plan
tree and find out where expressions are generated and where there
references are not required - thus removing expressions not needed .
In my case the column is a huge column and hence it will matter if it
is removed from the projection list as soon as possible ?

Any insights welcome and thanks for all help

Regards
Harmeek

Re: Expression Pruning in postgress

From

Tom Lane

Date:

07 July 2011, 14:25:03

HarmeekSingh Bedi <harmeeksingh@gmail.com> writes:
>    In a plan where Node 1 is parent {say join) and Node 2 is child
> (say scan)  . If node 1 has a expression say foo(column) then scan
> will project 'column'  for sure and join will
>       evaluate foo(column).  Now if the node above join does not need
> column ? how does postgress remove the column from join's projection
> list .

See build_joinrel_tlist() in relnode.c.

> I am seeing that it does not in many
>       cases specially when sort appears above.

Please show a concrete example.
        regards, tom lane

Re: Expression Pruning in postgress

From

HarmeekSingh Bedi

Date:

10 July 2011, 07:38:41

<br />Thanks Tom. Here is a example. Just a background of things . I have made changes in postgress execution and
storageengine to make it a MPP style engine - keeping all optimizer intact. Basically take pgress serial plan and
constructa parallel plan. The query I am running is below.<br /><br /><u><b>Query</b></u><br /><br /># explain select
ooj.local_time,ooj.is_timeout,ooj.capsulename,<br/>       regexp_split_to_table(<br />            case<br />          
    when ooj.isubstr is null then 'none'<br />                when ooj.isubstr='' then 'none'<br />                else
ooj.isubstr<br/>            end ,';') as interest,<br />       sum(ooj.impression) count_impressions,<br />      
sum(ooj.click)count_clicks<br />from (<br />    select (impression.server_utc_time/3600000)*3600 as local_time,<br />  
        impression.is_timeout_default_impression as is_timeout,<br />            impression.capsule_name as
capsulename,<br/>            substring(impression.website_variables from 'ybt=([0-9;]*)') as isubstr,<br />          
 1as impression,<br />            case click.impression_id when null then 0 else 1 end as click<br />   from<br />    
      impression_ytw2_row impression<br />   left outer join  clicks_row click<br />   on impression.impression_id =
click.impression_id)ooj<br /> group by local_time, is_timeout, capsulename, interest;<br /><br /><u><b>Now if you
kindlybear with me and ignore the Parallel nodes in the plan {which are just parallel distributors} . If you compare
thetwo plans below and check the output tupledesc of the Hash join you will see that some columns are not gettign
prunedout - In my case this is a big column. <br /></b></u><br /><b>Case 1 Plan {inline view gets merged} </b><br /><br
/>  Now when the inline view gets merged via pullup_subqueries I can see that we have columns
{impression.website_variables}which should get pruned out but it does not after the JOIN { it gets pruned out after
hashaggregate). <br /><br /> Parallel reciever RANDOM queue  (nodenum=0 cost=132.79..133.12 rows=10 width=104)<br />  
Output:(sum(1)), impression.is_timeout_default_impression, impression.capsule_name, (sum(1)), sum(1), sum(1)<br />  
-> Parallel sender RANDOM queue  (nodenum=1 cost=132.79..133.12 rows=10 width=104)<br />          Output: (sum(1)),
impression.is_timeout_default_impression,impression.capsule_name, (sum(1)), sum(1), sum(1)<br />         -> 
HashAggregate (nodenum=2 cost=132.79..133.12 rows=10 width=104)<br />               Output:
(((impression.server_utc_time/ 3600000) * 3600)), impression.is_timeout_default_impression, impression.capsule_name,
(regexp_split_to_table(CASEWHEN ("substring"((impression.website_variables)::text, 'ybt=([0-9;]*)'::text) IS NULL<br />
 THEN'none'::text WHEN ("substring"((impression.website_variables)::text, 'ybt=([0-9;]*)'::text) = ''::text) THEN
'none'::textELSE "substring"((impression.website_variables)::text, 'ybt=([0-9;]*)'::text) END, ';'::text)), sum(1),
sum(1)<br/>                ->  Parallel reciever HASH-AGG queue  (nodenum=3 cost=117.45..130.08 rows=181
width=104)<br/>                     Output: impression.server_utc_time, impression.is_timeout_default_impression,
impression.capsule_name,<b>impression.website_variables</b>, ((impression.server_utc_time / 3600000) * 3600),
regexp_split_to_table(CASEWHEN ("substring"((impres<br /> ion.website_variables)::text, 'ybt=([0-9;]*)'::text) IS NULL)
THEN'none'::text WHEN ("substring"((impression.website_variables)::text, 'ybt=([0-9;]*)'::text) = ''::text) THEN
'none'::textELSE "substring"((impression.website_variables)::text, 'ybt=([0-9;<br /> *)'::text) END, ';'::text)<br
/>                    ->  Parallel sender HASH-AGG queue  (nodenum=4 cost=117.45..130.08 rows=181 width=104)<br
/>                          Output: impression.server_utc_time, impression.is_timeout_default_impression,
impression.capsule_name,<b>impression.website_variables</b>, ((impression.server_utc_time / 3600000) * 3600),
regexp_split_to_table(CASEWHEN ("substring"((<br /> mpression.website_variables)::text, 'ybt=([0-9;]*)'::text) IS NULL)
THEN'none'::text WHEN ("substring"((impression.website_variables)::text, 'ybt=([0-9;]*)'::text) = ''::text) THEN
'none'::textELSE "substring"((impression.website_variables)::text, 'ybt=<br /> [0-9;]*)'::text) END, ';'::text)<br
/>                          <b>->  Hash Left Join  (nodenum=5 cost=117.45..130.08 rows=181 width=104)<br
/>                                Output: impression.server_utc_time, impression.is_timeout_default_impression,
impression.capsule_name,impression.website_variables, ((impression.server_utc_time / 3600000) * 3600),
regexp_split_to_table(CASEWHEN ("substr<br /> ng"((impression.website_variables)::text, 'ybt=([0-9;]*)'::text) IS NULL)
THEN'none'::text WHEN ("substring"((impression.website_variables)::text, 'ybt=([0-9;]*)'::text) = ''::text) THEN
'none'::textELSE "substring"((impression.website_variables)::text,</b><br /> 'ybt=([0-9;]*)'::text) END, ';'::text)<br
/>                                ->  Parallel reciever HASH-LEFT queue  (nodenum=6 cost=0.00..3.10 rows=10
width=136)<br/>                                       Output: impression.server_utc_time,
impression.is_timeout_default_impression,impression.capsule_name, impression.website_variables,
impression.impression_id<br/>                                        ->  Parallel sender HASH-LEFT queue  (nodenum=7
cost=0.00..3.10rows=10 width=136)<br />                                             Output: impression.server_utc_time,
impression.is_timeout_default_impression,impression.capsule_name, impression.website_variables,
impression.impression_id<br/>                                              ->  Seq Scan on impression_ytw2_row
impression (nodenum=8 cost=0.00..3.10 rows=10 width=136)<br />                                                  
Output:impression.server_utc_time, impression.is_timeout_default_impression, impression.capsule_name,
impression.website_variables,impression.impression_id<br />                                  ->  Hash  (nodenum=9
cost=72.20..72.20rows=3620 width=32)<br />                                       Output: click.impression_id<br
/>                                      ->  Parallel reciever HASH-RIGHT queue  (nodenum=10 cost=0.00..72.20
rows=3620width=32)<br />                                              Output: click.impression_id<br
/>                                            ->  Parallel sender HASH-RIGHT queue  (nodenum=11 cost=0.00..72.20
rows=3620width=32)<br />                                                   Output: click.impression_id<br />
                                                  ->  Seq Scan on clicks_row click  (nodenum=12 cost=0.00..72.20
rows=3620width=32)<br />                                                         Output: click.impression_id<br /><br
/><u><b>Case2 { PLan when I disable code to merge view - basically do not call pullup_subqueries - it does the right
thing}<br /><br /></b></u> Parallel reciever RANDOM queue  (nodenum=0 cost=133.70..137.32 rows=181 width=112)<br />   
Output:ooj.local_time, ooj.is_timeout, ooj.capsulename, (), sum((sum((sum(ooj.impression))))),
sum((sum((sum(ooj.click)))))<br/>   ->  Parallel sender RANDOM queue  (nodenum=1 cost=133.70..137.32 rows=181
width=112)<br/>          Output: ooj.local_time, ooj.is_timeout, ooj.capsulename, (), sum((sum(ooj.impression))),
sum((sum(ooj.click)))<br/>         ->  HashAggregate  (nodenum=2 cost=133.70..137.32 rows=181 width=112)<br
/>              Output: ooj.local_time, ooj.is_timeout, ooj.capsulename, (regexp_split_to_table(CASE WHEN (ooj.isubstr
ISNULL) THEN 'none'::text WHEN (ooj.isubstr = ''::text) THEN 'none'::text ELSE ooj.isubstr END, ';'::text)),
sum(ooj.impression),sum(oo<br /> .click)<br />               ->  Parallel reciever HASH-AGG queue  (nodenum=3
cost=117.45..130.98rows=181 width=112)<br />                     Output: ooj.local_time, ooj.is_timeout,
ooj.capsulename,ooj.isubstr, ooj.impression, ooj.click, regexp_split_to_table(CASE WHEN (ooj.isubstr IS NULL) THEN
'none'::textWHEN (ooj.isubstr = ''::text) THEN 'none'::text ELSE ooj.isubstr <br /> ND, ';'::text)<br
/>                    ->  Parallel sender HASH-AGG queue  (nodenum=4 cost=117.45..130.98 rows=181 width=112)<br
/>                          Output: ooj.local_time, ooj.is_timeout, ooj.capsulename, ooj.isubstr, ooj.impression,
ooj.click,regexp_split_to_table(CASE WHEN (ooj.isubstr IS NULL) THEN 'none'::text WHEN (ooj.isubstr = ''::text) THEN
'none'::textELSE <a href="http://ooj.is">ooj.is</a><br /> bstr END, ';'::text)<br />                           -> 
SubqueryScan ooj  (nodenum=5 cost=117.45..130.98 rows=181 width=112)<br />                                 Output:
ooj.local_time,ooj.is_timeout, ooj.capsulename, ooj.isubstr, ooj.impression, ooj.click, regexp_split_to_table(CASE WHEN
(ooj.isubstrIS NULL) THEN 'none'::text WHEN (ooj.isubstr = ''::text) THEN 'none'::text ELSE <br /> oj.isubstr END,
';'::text)<br/>                                 <b>->  Hash Left Join  (nodenum=6 cost=117.45..128.27 rows=181
width=104)<br/>                                       Output: ((impression.server_utc_time / 3600000) * 3600),
impression.is_timeout_default_impression,impression.capsule_name, "substring"((impression.website_variables)::text,
'ybt=([0-9;]*)'::text),1, 1</b><br />                                        ->  Parallel reciever HASH-LEFT queue 
(nodenum=7cost=0.00..3.10 rows=10 width=136)<br />                                             Output:
impression.server_utc_time,impression.is_timeout_default_impression, impression.capsule_name,
impression.website_variables,impression.impression_id<br />                                              -> 
Parallelsender HASH-LEFT queue  (nodenum=8 cost=0.00..3.10 rows=10 width=136)<br
/>                                                  Output: impression.server_utc_time,
impression.is_timeout_default_impression,impression.capsule_name, impression.website_variables,
impression.impression_id<br/>                                                    ->  Seq Scan on impression_ytw2_row
impression (nodenum=9 cost=0.00..3.10 rows=10 width=136)<br />                                                        
Output:impression.server_utc_time, impression.is_timeout_default_impression, impression.capsule_name,
impression.website_variables,impression.impression_id<br />                                        ->  Hash 
(nodenum=10cost=72.20..72.20 rows=3620 width=32)<br />                                             Output:
click.impression_id<br/>                                             ->  Parallel reciever HASH-RIGHT queue 
(nodenum=11cost=0.00..72.20 rows=3620 width=32)<br />                                                    Output:
click.impression_id<br/>                                                   ->  Parallel sender HASH-RIGHT queue 
(nodenum=12cost=0.00..72.20 rows=3620 width=32)<br />                                                          Output:
click.impression_id<br/>                                                         ->  Seq Scan on clicks_row click 
(nodenum=13cost=0.00..72.20 rows=3620 width=32)<br />                                                               
Output:click.impression_id<br /><br /><u><b>More analysis<br /><br /></b></u><ol><li>Looked at code in make_join_rel
andbuild_joinrel_tlist I can see in one case the bms_nonempty_difference code kicks in and finds that the column can be
prunedout and other case it does not do the right thing.<li>I am still trying to work out why pullup_subquery should
makea difference.</ol>Any pointers appreciated .<br /><br />Regards<br />Harmeek<br /><br />2011 at 7:24 AM, Tom Lane
<<ahref="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>> wrote:<br /> > HarmeekSingh Bedi <<a
href="mailto:harmeeksingh@gmail.com">harmeeksingh@gmail.com</a>>writes:<br />>>    In a plan where Node 1 is
parent{say join) and Node 2 is child<br />>> (say scan)  . If node 1 has a expression say foo(column) then
scan<br/> >> will project 'column'  for sure and join will<br />>>       evaluate foo(column).  Now if the
nodeabove join does not need<br />>> column ? how does postgress remove the column from join's projection<br />
>>list .<br />><br />> See build_joinrel_tlist() in relnode.c.<br />><br />>> I am seeing that it
doesnot in many<br />>>       cases specially when sort appears above.<br />><br />> Please show a concrete
example.<br/> ><br />>                        regards, tom lane<br />><br /><br />

Re: Expression Pruning in postgress

From

Tom Lane

Date:

10 July 2011, 17:28:47

HarmeekSingh Bedi <harmeeksingh@gmail.com> writes:
> Thanks Tom. Here is a example. Just a background of things . I have made
> changes in postgress execution and storage engine to make it a MPP style
> engine - keeping all optimizer intact. Basically take pgress serial plan and
> construct a parallel plan. The query I am running is below.

The output lists for the parallel nodes look pretty broken, but I guess
you weren't asking about those.  As near as I can tell, what you're
unhappy about is that it's passing up both raw column values and
pre-evaluated placeholder expressions using those values, when only the
placeholders are really going to be needed.  Yeah, that's probably true,
because the placeholder mechanism isn't (yet) taken into account by the
code that determines how far up a column value will be needed.

In standard Postgres this isn't much of an issue because passing up
by-reference Datums is really quite cheap ... it's only a pointer copy
in many cases, and even where it's not, it's probably just a
toast-pointer copy.  I suspect it's costing you more because your
"parallel" nodes have to instantiate the tuples instead of just passing
virtual slots around ... but it's still not clear to me why you're
passing more than a toast pointer for big values.  Maybe you're being
too enthusiastic about detoasting pointers early?
        regards, tom lane

Re: Expression Pruning in postgress

From

HarmeekSingh Bedi

Date:

13 July 2011, 15:45:19

Hi tom .<br /><br />Thanks for your input . Appreciate your taking time and responding . Just some comments.<br /><br
/><ol><li>Maybe I am mistaken Kindly  help me understand a bit more. I do agree that passing datums up the node chain
helps- but consider the case when either Sort or Hash joins spills on disk - large columns that get written on to the
diskwill still cause a lot of performance issues {as sorts spills will detoast} - lot of unnecessary columns will cause
lotof I/O. 1024 varchars and lot of rows and you can see that serial case detoriates due to this.<li>The parallel case
works- the parallel nodes inherit the target list of the underlying nodes  - but in my case the issue of non pruned
columnbecomes worse as it also adds to network payload which is worse. <br /><li> Now coming to your detoast . I have
todo that at parallel node boundaries as the data flow operators {delimited by parallel operators} run on different
machinesand hence has to pass by value. </ol>I did make a fix at least to alleviate this case in the optimizer . But I
amgoing to work on a more general approach of expression pruning based on the lifetime of an expression. Basically each
nodewill either references or generate an expression. Any expression that is generated and is not referenced by any top
ontop will be eliminated. <br /><br /> Regards<br /> Harmeek<br /><br /><div class="gmail_quote">On Sun, Jul 10, 2011
at10:28 AM, Tom Lane <span dir="ltr"><<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>></span> wrote:<br
/><blockquoteclass="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204);
padding-left:1ex;"><div class="im">HarmeekSingh Bedi <<a
href="mailto:harmeeksingh@gmail.com">harmeeksingh@gmail.com</a>>writes:<br /></div><div class="im">> Thanks Tom.
Hereis a example. Just a background of things . I have made<br /> > changes in postgress execution and storage
engineto make it a MPP style<br /> > engine - keeping all optimizer intact. Basically take pgress serial plan and<br
/>> construct a parallel plan. The query I am running is below.<br /><br /></div>The output lists for the parallel
nodeslook pretty broken, but I guess<br /> you weren't asking about those.  As near as I can tell, what you're<br />
unhappyabout is that it's passing up both raw column values and<br /> pre-evaluated placeholder expressions using those
values,when only the<br /> placeholders are really going to be needed.  Yeah, that's probably true,<br /> because the
placeholdermechanism isn't (yet) taken into account by the<br /> code that determines how far up a column value will be
needed.<br/><br /> In standard Postgres this isn't much of an issue because passing up<br /> by-reference Datums is
reallyquite cheap ... it's only a pointer copy<br /> in many cases, and even where it's not, it's probably just a<br />
toast-pointercopy.  I suspect it's costing you more because your<br /> "parallel" nodes have to instantiate the tuples
insteadof just passing<br /> virtual slots around ... but it's still not clear to me why you're<br /> passing more than
atoast pointer for big values.  Maybe you're being<br /> too enthusiastic about detoasting pointers early?<br /><br />
                      regards, tom lane<br /></blockquote></div><br />

Re: Expression Pruning in postgress

From

Tom Lane

Date:

13 July 2011, 15:59:30

HarmeekSingh Bedi <harmeeksingh@gmail.com> writes:
>    1. May be I am mistaken Kindly  help me understand a bit more. I do agree
>    that passing datums up the node chain helps - but consider the case when
>    either Sort or Hash joins spills on disk - large columns that get written on
>    to the disk will still cause a lot of performance issues {as sorts spills
>    will detoast}

No, they don't.  What gets sent to disk is normally just the toast
pointer datum (19 bytes or whatever it is these days).

> I did make a fix at least to alleviate this case in the optimizer . But I am
> going to work on a more general approach of expression pruning based on the
> lifetime of an expression. Basically each node will either references or
> generate an expression. Any expression that is generated and is not
> referenced by any top on top will be eliminated.

Sounds like overkill.
        regards, tom lane

Re: Expression Pruning in postgress

From

Tom Lane

Date:

13 July 2011, 17:43:20

I wrote:
> HarmeekSingh Bedi <harmeeksingh@gmail.com> writes:
>> I did make a fix at least to alleviate this case in the optimizer . But I am
>> going to work on a more general approach of expression pruning based on the
>> lifetime of an expression. Basically each node will either references or
>> generate an expression. Any expression that is generated and is not
>> referenced by any top on top will be eliminated.

> Sounds like overkill.

BTW, I looked a little more closely at the example you posted earlier,
and it's a lot simpler than I initially thought.  The actual issue seems
to not be anything to do with placeholders; it's just that
make_subplanTargetList() is lazy about deciding what to put into the
targetlist that will be passed to query_planner.  Per its comment,
* For example, given a query like*        SELECT a+b,SUM(c+d) FROM table GROUP BY a+b;* we want to pass this targetlist
tothe subplan:*        a,b,c,d,a+b* where the a+b target will be used by the Sort/Group steps, and the* other targets
willbe used for computing the final results.    (In the* above example we could theoretically suppress the a and b
targetsand* pass down only c,d,a+b, but it's not really worth the trouble to* eliminate simple var references from the
subplan. We will avoid doing* the extra computation to recompute a+b at the outer level; see* fix_upper_expr() in
setrefs.c.)

The extra variables you're complaining about are all used in GROUP BY
expressions, so the above describes exactly the behavior you're seeing.

Possibly it wouldn't be too hard to separate the GROUP BY targetlist
items from the rest and only apply flatten_tlist to the items that
aren't GROUP BY targets.  I'm unconvinced that the use case is wide
enough to be worth the trouble, though.
        regards, tom lane