Thread: minimizing the target list for foreign data wrappers

minimizing the target list for foreign data wrappers

From
David Gudeman
Date:
A few years ago I wrote a roll-your-own foreign-data-wrapper system for Postgres because Postgres didn't have one at the time (some details here if anyone is interested). Now I'm being tasked to move it to Postgres 9.2.x and I'd like to use FDW if possible.

One of the problems I'm having is that in my application, the foreign tables typically have hundreds of columns while typical queries only access a dozen or so (the foreign server is a columnar SQL database). Furthermore, there is no size optimization for NULL values passed back from the foreign server, so if I return all of the columns from the table --even as NULLs-- the returned data size will be several times the size that it needs to be. My application cannot tolerate this level of inefficiency, so I need to return minimal columns from the foreign table.

The documentation doesn't say how to do this, but looking at the code I think it is possible. In GetForeignPlan() you have to pass on the tlist argument, which I presume means that the query plan will use the tlist that I pass in, right? If so, then it should be possible for me to write a function that takes tlist and baserel->reltargetlist and return a version of tlist that knows which foreign-table columns are actually used, and replaces the rest with a NULL constant.

For example, suppose the original tlist is this: [VAR(attrno=1), VAR(attrno=2), VAR(attrno=3)] and reltarget list says that I only need args 1 and 3. Then the new tlist would look like this: [VAR(attrno=1), CONST(val=NULL), VAR(attrno=2)] where the attrno of the last VAR has been reduced by one because the 2 column is no longer there.

I did something very much like this in my roll-your-own version of FDW so I know basically how to do it, but I did it at the pre-planning stage and I'm not sure how much is already packed into the other plan nodes at this point. Maybe it's too late to change the target list?

Can anyone give me some advice or warnings on this? I'd hate to go to the trouble of implementing and testing it only to find that I'm making some bogus assumptions.

Thanks,
David Gudeman

Re: minimizing the target list for foreign data wrappers

From
David Gudeman
Date:
Re-reading my first email I thought it was a little confusing, so here
is some clarification. In GetForeignPlan, tlist seems to be a target
list for a basic "select *" from the foreign table. For the ith
TargetEntry te in tlist, it seems that te->expr is a var with
varattno=i. I was mis-remembering and calling varattno "attrno" in the
original email.

My assumption is that the plan elements that use the output of the FDW
plan node will access columns indirectly using tlist. In other words,
I'm assuming that if there is a reference to a column c of the foreign
table, this column will be represented as a Var with varattno being an
offset into tlist. So if c is column number 3, for example, you get
its value by looking up TargetEntry number 3 in tlist and evaluate the
expr column for that TargetEntry. So if I change the  Var in the expr
column so the varattno points to a different column in the output
tuple, then everything will work.

The two risky assumptions I'm making are 1. that it actually uses this
indirect way of looking up columns in a foreign table and 2. that it
actually uses the tlist that I pass in when I call make_foreignscan().

Can anyone confirm or deny these assumptions?

Thanks.

On Sun, Apr 21, 2013 at 6:57 PM, David Gudeman <dave.gudeman@gmail.com> wrote:
>
> A few years ago I wrote a roll-your-own foreign-data-wrapper system for Postgres because Postgres didn't have one at
thetime (some details here (http://unobtainabol.blogspot.com/2013/04/dave-foreign-data-introuction.html) if anyone is
interested).Now I'm being tasked to move it to Postgres 9.2.x and I'd like to use FDW if possible. 
>
> One of the problems I'm having is that in my application, the foreign tables typically have hundreds of columns while
typicalqueries only access a dozen or so (the foreign server is a columnar SQL database). Furthermore, there is no size
optimizationfor NULL values passed back from the foreign server, so if I return all of the columns from the table
--evenas NULLs-- the returned data size will be several times the size that it needs to be. My application cannot
toleratethis level of inefficiency, so I need to return minimal columns from the foreign table. 
>
> The documentation doesn't say how to do this, but looking at the code I think it is possible. In GetForeignPlan() you
haveto pass on the tlist argument, which I presume means that the query plan will use the tlist that I pass in, right?
Ifso, then it should be possible for me to write a function that takes tlist and baserel->reltargetlist and return a
versionof tlist that knows which foreign-table columns are actually used, and replaces the rest with a NULL constant. 
>
> For example, suppose the original tlist is this: [VAR(attrno=1), VAR(attrno=2), VAR(attrno=3)] and reltarget list
saysthat I only need args 1 and 3. Then the new tlist would look like this: [VAR(attrno=1), CONST(val=NULL),
VAR(attrno=2)]where the attrno of the last VAR has been reduced by one because the 2 column is no longer there. 
>
> I did something very much like this in my roll-your-own version of FDW so I know basically how to do it, but I did it
atthe pre-planning stage and I'm not sure how much is already packed into the other plan nodes at this point. Maybe
it'stoo late to change the target list? 
>
> Can anyone give me some advice or warnings on this? I'd hate to go to the trouble of implementing and testing it only
tofind that I'm making some bogus assumptions. 
>
> Thanks,
> David Gudeman
>



Re: minimizing the target list for foreign data wrappers

From
David Gudeman
Date:
In case anyone is interested, I tried it and it doesn't seem to work.
It looks like some other plan element already has the target-list
tuple baked. Now I'm trying to decide whether to give up on FDW. It's
a shame because it's such a sweet facility, but at this point, I just
don't think that it's mature enough for what I need to do.
Regards,
David Gudeman


On Mon, Apr 22, 2013 at 11:27 AM, David Gudeman <dave.gudeman@gmail.com> wrote:
> Re-reading my first email I thought it was a little confusing, so here
> is some clarification. In GetForeignPlan, tlist seems to be a target
> list for a basic "select *" from the foreign table. For the ith
> TargetEntry te in tlist, it seems that te->expr is a var with
> varattno=i. I was mis-remembering and calling varattno "attrno" in the
> original email.
>
> My assumption is that the plan elements that use the output of the FDW
> plan node will access columns indirectly using tlist. In other words,
> I'm assuming that if there is a reference to a column c of the foreign
> table, this column will be represented as a Var with varattno being an
> offset into tlist. So if c is column number 3, for example, you get
> its value by looking up TargetEntry number 3 in tlist and evaluate the
> expr column for that TargetEntry. So if I change the  Var in the expr
> column so the varattno points to a different column in the output
> tuple, then everything will work.
>
> The two risky assumptions I'm making are 1. that it actually uses this
> indirect way of looking up columns in a foreign table and 2. that it
> actually uses the tlist that I pass in when I call make_foreignscan().
>
> Can anyone confirm or deny these assumptions?
>
> Thanks.
>
> On Sun, Apr 21, 2013 at 6:57 PM, David Gudeman <dave.gudeman@gmail.com> wrote:
>>
>> A few years ago I wrote a roll-your-own foreign-data-wrapper system for Postgres because Postgres didn't have one at
thetime (some details here (http://unobtainabol.blogspot.com/2013/04/dave-foreign-data-introuction.html) if anyone is
interested).Now I'm being tasked to move it to Postgres 9.2.x and I'd like to use FDW if possible. 
>>
>> One of the problems I'm having is that in my application, the foreign tables typically have hundreds of columns
whiletypical queries only access a dozen or so (the foreign server is a columnar SQL database). Furthermore, there is
nosize optimization for NULL values passed back from the foreign server, so if I return all of the columns from the
table--even as NULLs-- the returned data size will be several times the size that it needs to be. My application cannot
toleratethis level of inefficiency, so I need to return minimal columns from the foreign table. 
>>
>> The documentation doesn't say how to do this, but looking at the code I think it is possible. In GetForeignPlan()
youhave to pass on the tlist argument, which I presume means that the query plan will use the tlist that I pass in,
right?If so, then it should be possible for me to write a function that takes tlist and baserel->reltargetlist and
returna version of tlist that knows which foreign-table columns are actually used, and replaces the rest with a NULL
constant.
>>
>> For example, suppose the original tlist is this: [VAR(attrno=1), VAR(attrno=2), VAR(attrno=3)] and reltarget list
saysthat I only need args 1 and 3. Then the new tlist would look like this: [VAR(attrno=1), CONST(val=NULL),
VAR(attrno=2)]where the attrno of the last VAR has been reduced by one because the 2 column is no longer there. 
>>
>> I did something very much like this in my roll-your-own version of FDW so I know basically how to do it, but I did
itat the pre-planning stage and I'm not sure how much is already packed into the other plan nodes at this point. Maybe
it'stoo late to change the target list? 
>>
>> Can anyone give me some advice or warnings on this? I'd hate to go to the trouble of implementing and testing it
onlyto find that I'm making some bogus assumptions. 
>>
>> Thanks,
>> David Gudeman
>>



Re: minimizing the target list for foreign data wrappers

From
David Fetter
Date:
David,

Please post your patch(es) and some demo of how things broke so others
can improve future versions--possibly even 9.3 versions if it turns
out you've discovered a bug in the implementation.

Thanks very much for your hard work and insights into this.

Cheers,
David.
On Tue, Apr 23, 2013 at 11:08:04AM -0700, David Gudeman wrote:
> In case anyone is interested, I tried it and it doesn't seem to work.
> It looks like some other plan element already has the target-list
> tuple baked. Now I'm trying to decide whether to give up on FDW. It's
> a shame because it's such a sweet facility, but at this point, I just
> don't think that it's mature enough for what I need to do.
> Regards,
> David Gudeman
> 
> 
> On Mon, Apr 22, 2013 at 11:27 AM, David Gudeman <dave.gudeman@gmail.com> wrote:
> > Re-reading my first email I thought it was a little confusing, so here
> > is some clarification. In GetForeignPlan, tlist seems to be a target
> > list for a basic "select *" from the foreign table. For the ith
> > TargetEntry te in tlist, it seems that te->expr is a var with
> > varattno=i. I was mis-remembering and calling varattno "attrno" in the
> > original email.
> >
> > My assumption is that the plan elements that use the output of the FDW
> > plan node will access columns indirectly using tlist. In other words,
> > I'm assuming that if there is a reference to a column c of the foreign
> > table, this column will be represented as a Var with varattno being an
> > offset into tlist. So if c is column number 3, for example, you get
> > its value by looking up TargetEntry number 3 in tlist and evaluate the
> > expr column for that TargetEntry. So if I change the  Var in the expr
> > column so the varattno points to a different column in the output
> > tuple, then everything will work.
> >
> > The two risky assumptions I'm making are 1. that it actually uses this
> > indirect way of looking up columns in a foreign table and 2. that it
> > actually uses the tlist that I pass in when I call make_foreignscan().
> >
> > Can anyone confirm or deny these assumptions?
> >
> > Thanks.
> >
> > On Sun, Apr 21, 2013 at 6:57 PM, David Gudeman <dave.gudeman@gmail.com> wrote:
> >>
> >> A few years ago I wrote a roll-your-own foreign-data-wrapper system for Postgres because Postgres didn't have one
atthe time (some details here (http://unobtainabol.blogspot.com/2013/04/dave-foreign-data-introuction.html) if anyone
isinterested). Now I'm being tasked to move it to Postgres 9.2.x and I'd like to use FDW if possible.
 
> >>
> >> One of the problems I'm having is that in my application, the foreign tables typically have hundreds of columns
whiletypical queries only access a dozen or so (the foreign server is a columnar SQL database). Furthermore, there is
nosize optimization for NULL values passed back from the foreign server, so if I return all of the columns from the
table--even as NULLs-- the returned data size will be several times the size that it needs to be. My application cannot
toleratethis level of inefficiency, so I need to return minimal columns from the foreign table.
 
> >>
> >> The documentation doesn't say how to do this, but looking at the code I think it is possible. In GetForeignPlan()
youhave to pass on the tlist argument, which I presume means that the query plan will use the tlist that I pass in,
right?If so, then it should be possible for me to write a function that takes tlist and baserel->reltargetlist and
returna version of tlist that knows which foreign-table columns are actually used, and replaces the rest with a NULL
constant.
> >>
> >> For example, suppose the original tlist is this: [VAR(attrno=1), VAR(attrno=2), VAR(attrno=3)] and reltarget list
saysthat I only need args 1 and 3. Then the new tlist would look like this: [VAR(attrno=1), CONST(val=NULL),
VAR(attrno=2)]where the attrno of the last VAR has been reduced by one because the 2 column is no longer there.
 
> >>
> >> I did something very much like this in my roll-your-own version of FDW so I know basically how to do it, but I did
itat the pre-planning stage and I'm not sure how much is already packed into the other plan nodes at this point. Maybe
it'stoo late to change the target list?
 
> >>
> >> Can anyone give me some advice or warnings on this? I'd hate to go to the trouble of implementing and testing it
onlyto find that I'm making some bogus assumptions.
 
> >>
> >> Thanks,
> >> David Gudeman
> >>
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: minimizing the target list for foreign data wrappers

From
Tom Lane
Date:
David Gudeman <dave.gudeman@gmail.com> writes:
> One of the problems I'm having is that in my application, the foreign
> tables typically have hundreds of columns while typical queries only access
> a dozen or so (the foreign server is a columnar SQL database). Furthermore,
> there is no size optimization for NULL values passed back from the foreign
> server, so if I return all of the columns from the table --even as NULLs--
> the returned data size will be several times the size that it needs to be.
> My application cannot tolerate this level of inefficiency, so I need to
> return minimal columns from the foreign table.

That's already possible; see contrib/postgres_fdw in HEAD for an
existence proof.

> The documentation doesn't say how to do this, but looking at the code I
> think it is possible. In GetForeignPlan() you have to pass on the tlist
> argument, which I presume means that the query plan will use the tlist that
> I pass in, right? If so, then it should be possible for me to write a
> function that takes tlist and baserel->reltargetlist and return a version
> of tlist that knows which foreign-table columns are actually used, and
> replaces the rest with a NULL constant.

You do not get to editorialize on the tlist that will be computed by the
ForeignScan node: in the case of a simple single-table SELECT, that's
going to be computing the expressions the user asked for.  Nor can you
alter the expectation about the rowtype of the scan tuple that's
returned by the FDW: that needs to match the declared rowtype of the
foreign table.  However, you can skip fetching unneeded columns and just
set those fields of the scan tuple to nulls.  That's cheap enough
(particularly if the scan tuple stays virtual) that I'm unconvinced we
should contort the APIs to the extent that would be needed to let the
FDW change the scan tuple rowtype dynamically.
        regards, tom lane