Re: Expression Pruning in postgress - Mailing list pgsql-hackers
From | HarmeekSingh Bedi |
---|---|
Subject | Re: Expression Pruning in postgress |
Date | |
Msg-id | CALLwk6tSx3_rmBpLwYViGhBBeidy2n64gPZaPDcwKELNC1iQmA@mail.gmail.com Whole thread Raw |
In response to | Re: Expression Pruning in postgress (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Expression Pruning in postgress
|
List | pgsql-hackers |
Hi tom .<br /><br />Thanks for your input . Appreciate your taking time and responding . Just some comments.<br /><br /><ol><li>Maybe I am mistaken Kindly help me understand a bit more. I do agree that passing datums up the node chain helps- but consider the case when either Sort or Hash joins spills on disk - large columns that get written on to the diskwill still cause a lot of performance issues {as sorts spills will detoast} - lot of unnecessary columns will cause lotof I/O. 1024 varchars and lot of rows and you can see that serial case detoriates due to this.<li>The parallel case works- the parallel nodes inherit the target list of the underlying nodes - but in my case the issue of non pruned columnbecomes worse as it also adds to network payload which is worse. <br /><li> Now coming to your detoast . I have todo that at parallel node boundaries as the data flow operators {delimited by parallel operators} run on different machinesand hence has to pass by value. </ol>I did make a fix at least to alleviate this case in the optimizer . But I amgoing to work on a more general approach of expression pruning based on the lifetime of an expression. Basically each nodewill either references or generate an expression. Any expression that is generated and is not referenced by any top ontop will be eliminated. <br /><br /> Regards<br /> Harmeek<br /><br /><div class="gmail_quote">On Sun, Jul 10, 2011 at10:28 AM, Tom Lane <span dir="ltr"><<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>></span> wrote:<br /><blockquoteclass="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left:1ex;"><div class="im">HarmeekSingh Bedi <<a href="mailto:harmeeksingh@gmail.com">harmeeksingh@gmail.com</a>>writes:<br /></div><div class="im">> Thanks Tom. Hereis a example. Just a background of things . I have made<br /> > changes in postgress execution and storage engineto make it a MPP style<br /> > engine - keeping all optimizer intact. Basically take pgress serial plan and<br />> construct a parallel plan. The query I am running is below.<br /><br /></div>The output lists for the parallel nodeslook pretty broken, but I guess<br /> you weren't asking about those. As near as I can tell, what you're<br /> unhappyabout is that it's passing up both raw column values and<br /> pre-evaluated placeholder expressions using those values,when only the<br /> placeholders are really going to be needed. Yeah, that's probably true,<br /> because the placeholdermechanism isn't (yet) taken into account by the<br /> code that determines how far up a column value will be needed.<br/><br /> In standard Postgres this isn't much of an issue because passing up<br /> by-reference Datums is reallyquite cheap ... it's only a pointer copy<br /> in many cases, and even where it's not, it's probably just a<br /> toast-pointercopy. I suspect it's costing you more because your<br /> "parallel" nodes have to instantiate the tuples insteadof just passing<br /> virtual slots around ... but it's still not clear to me why you're<br /> passing more than atoast pointer for big values. Maybe you're being<br /> too enthusiastic about detoasting pointers early?<br /><br /> regards, tom lane<br /></blockquote></div><br />
pgsql-hackers by date: