Re: Expression Pruning in postgress - Mailing list pgsql-hackers

From HarmeekSingh Bedi
Subject Re: Expression Pruning in postgress
Date
Msg-id CALLwk6tSx3_rmBpLwYViGhBBeidy2n64gPZaPDcwKELNC1iQmA@mail.gmail.com
Whole thread Raw
In response to Re: Expression Pruning in postgress  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Expression Pruning in postgress
List pgsql-hackers
Hi tom .

Thanks for your input . Appreciate your taking time and responding . Just some comments.

  1. May be I am mistaken Kindly  help me understand a bit more. I do agree that passing datums up the node chain helps - but consider the case when either Sort or Hash joins spills on disk - large columns that get written on to the disk will still cause a lot of performance issues {as sorts spills will detoast} - lot of unnecessary columns will cause lot of I/O. 1024 varchars and lot of rows and you can see that serial case detoriates due to this.
  2. The parallel case works - the parallel nodes inherit the target list of the underlying nodes  - but in my case the issue of non pruned column becomes worse as it also adds to network payload which is worse.
  3. Now coming to your detoast . I have to do that at parallel node boundaries as the data flow operators {delimited by parallel operators} run on different machines and hence has to pass by value. 
I did make a fix at least to alleviate this case in the optimizer . But I am going to work on a more general approach of expression pruning based on the lifetime of an expression. Basically each node will either references or generate an expression. Any expression that is generated and is not referenced by any top on top will be eliminated.

 Regards
 Harmeek

On Sun, Jul 10, 2011 at 10:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
HarmeekSingh Bedi <harmeeksingh@gmail.com> writes:
> Thanks Tom. Here is a example. Just a background of things . I have made
> changes in postgress execution and storage engine to make it a MPP style
> engine - keeping all optimizer intact. Basically take pgress serial plan and
> construct a parallel plan. The query I am running is below.

The output lists for the parallel nodes look pretty broken, but I guess
you weren't asking about those.  As near as I can tell, what you're
unhappy about is that it's passing up both raw column values and
pre-evaluated placeholder expressions using those values, when only the
placeholders are really going to be needed.  Yeah, that's probably true,
because the placeholder mechanism isn't (yet) taken into account by the
code that determines how far up a column value will be needed.

In standard Postgres this isn't much of an issue because passing up
by-reference Datums is really quite cheap ... it's only a pointer copy
in many cases, and even where it's not, it's probably just a
toast-pointer copy.  I suspect it's costing you more because your
"parallel" nodes have to instantiate the tuples instead of just passing
virtual slots around ... but it's still not clear to me why you're
passing more than a toast pointer for big values.  Maybe you're being
too enthusiastic about detoasting pointers early?

                       regards, tom lane

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Need help understanding pg_locks
Next
From: Tom Lane
Date:
Subject: Re: Expression Pruning in postgress