Thread: Surfacing qualifiers

Surfacing qualifiers

From

David Fetter

Date:

26 March 2008, 04:02:06

Folks,

Neil Conway sent me a patch that sketched out a plan to make quals
visible to functions, and Korry Douglas filled in much of the rest of
what you see attached here.  Mistakes are all mine. :)

Random observations:

* It appears I've botched the call to deparse_context_for_plan in
  src/backend/executor/execQual.c and/or made some more fundamental
  error because when testing with PL/Perl, I only see quals inside the
  function when there is just one.  Where are the APIs for things like
  deparse_context_for_plan() documented?

* This stuff should be available to all the PLs.  How might it work in
  PL/PgSQL, etc.?

* The patch just appends a WHERE clause on when it finds quals in
  contrib/dblink/dblink.c.  While this is probably a good place to
  start, it might be good to explore some kind of approach that allows
  more subtlety.

* In PL/Perl, $_TD->{_quals} gets the qualifiers, but they really
  should go in their own variable.  I'm thinking that one should be
  called $_QUAL.

* More generally, it would be nice to have the quals in some kind of
  data structure so the calling code could do smarter things with them
  than the current string-mashing the example code does.

Help, comments, brickbats, etc. appreciated :)

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Attachment

db_link_qual_pushdown-3.patch

Re: Surfacing qualifiers

From

Andrew Dunstan

Date:

26 March 2008, 09:31:57

David Fetter wrote:
> Folks,
>
> Neil Conway sent me a patch that sketched out a plan to make quals
> visible to functions
>   

er, what? Please explain what this means, why it might be useful. 
Example(s) would help.

> * In PL/Perl, $_TD->{_quals} gets the qualifiers, but they really
>   should go in their own variable.  I'm thinking that one should be
>   called $_QUAL.
>
>   

$_TD is only there for triggers. Is this a trigger feature?

cheers

andrew

Re: Surfacing qualifiers

From

Tom Lane

Date:

26 March 2008, 11:41:26

Andrew Dunstan <andrew@dunslane.net> writes:
> David Fetter wrote:
>> Neil Conway sent me a patch that sketched out a plan to make quals
>> visible to functions

> er, what? Please explain what this means, why it might be useful. 

It's utterly useless, because it only exposes a small fraction of the
information that would be needed to do anything.
        regards, tom lane

Re: Surfacing qualifiers

From

David Fetter

Date:

26 March 2008, 13:54:46

On Wed, Mar 26, 2008 at 08:31:04AM -0400, Andrew Dunstan wrote:
> David Fetter wrote:
>> Folks,
>>
>> Neil Conway sent me a patch that sketched out a plan to make quals
>> visible to functions
>>   
> er, what? Please explain what this means, why it might be useful.
> Example(s) would help.

Right now, user-callable code has no way to access the predicates of
the query which calls it.  Neil's patch was intended to expose some of
them.  The biggest use cases I see are for inter-database
communication such as dblink and friends.  The dblink part of the
patch shows what's supposed to be happening.

What happens now with dblink is that the remote table (more generally,
the output of a fixed query) gets materialized into memory in its
entirety, and if it's bigger than what's available, it will crash the
backend or worse.  That happens because functions do not have any
access to the predicates with which they were called, so the current
workaround is to pass the predicates manually and then cast.

Speaking of casting, how hard would it be to make something like this
work?

SELECT foo, bar, baz
FROM function_returning_setof_record('parameters','here') AS f(compound_type);

>> * In PL/Perl, $_TD->{_quals} gets the qualifiers, but they really
>>   should go in their own variable.  I'm thinking that one should be
>>   called $_QUAL.
>
> $_TD is only there for triggers. Is this a trigger feature?

Clearly it is not, but the procedure for adding a new variable which
needs to be available to functions is far from clear, so Korry decided
to wedge it into that existing variable to get a proof of concept out
the door.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Surfacing qualifiers

From

Tom Lane

Date:

26 March 2008, 14:21:26

David Fetter <david@fetter.org> writes:
> What happens now with dblink is that the remote table (more generally,
> the output of a fixed query) gets materialized into memory in its
> entirety, and if it's bigger than what's available, it will crash the
> backend or worse.

This is utter nonsense.  It gets put into a tuplestore which is entirely
capable of spilling to disk.  Slow, yes, but crashing is a lie.

> That happens because functions do not have any
> access to the predicates with which they were called, so the current
> workaround is to pass the predicates manually and then cast.

dblink is not a suitable framework for improving that situation.
Maybe someday we'll have a proper implementation of SQL/MED ...
        regards, tom lane

Re: Surfacing qualifiers

From

David Fetter

Date:

26 March 2008, 14:28:06

On Wed, Mar 26, 2008 at 01:21:14PM -0400, Tom Lane wrote:
> David Fetter <david@fetter.org> writes:
> > What happens now with dblink is that the remote table (more generally,
> > the output of a fixed query) gets materialized into memory in its
> > entirety, and if it's bigger than what's available, it will crash the
> > backend or worse.
> 
> This is utter nonsense.  It gets put into a tuplestore which is entirely
> capable of spilling to disk.  Slow, yes, but crashing is a lie.
> 
> > That happens because functions do not have any
> > access to the predicates with which they were called, so the current
> > workaround is to pass the predicates manually and then cast.
> 
> dblink is not a suitable framework for improving that situation.
> Maybe someday we'll have a proper implementation of SQL/MED ...

This is intended to be a step or two along the way :)

You mentioned in an earlier mail that the information exposed was
inadequate.  Could you sketch out what information would really be
needed and where to find it?

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Surfacing qualifiers

From

Tom Lane

Date:

26 March 2008, 15:04:17

David Fetter <david@fetter.org> writes:
> You mentioned in an earlier mail that the information exposed was
> inadequate.  Could you sketch out what information would really be
> needed and where to find it?

The main problem with what you suggest is that it'll fail utterly
on join queries.

AFAICS any real improvement in the situation will require exposing
remote tables as a concept understood by the planner, complete
with ways to obtain index and statistical information at plan time.
After suitable decisions about join strategy and so forth, we'd
wind up with a plan containing a "RemoteTableScan" node which
would have some restriction conditions attached.  Then forwarding
those to the remote database would be sensible.  But expecting a
dblink function to figure out which clauses are restrictions for
its table, when it doesn't even know what other tables were in
the query, is not sensible.
        regards, tom lane

Re: Surfacing qualifiers

From

Andrew Dunstan

Date:

26 March 2008, 15:39:01


David Fetter wrote:
>>
>> dblink is not a suitable framework for improving that situation.
>> Maybe someday we'll have a proper implementation of SQL/MED ...
>>     
>
> This is intended to be a step or two along the way :)
>
>
>   

I'm still waiting to see an example of where you say this patch is even 
marginally useful.


cheers

andrew

Re: Surfacing qualifiers

From

Neil Conway

Date:

26 March 2008, 18:26:48

On Wed, 2008-03-26 at 14:38 -0400, Andrew Dunstan wrote:
> I'm still waiting to see an example of where you say this patch is even 
> marginally useful.

It's not hard to think of one:

SELECT * FROM remote_table() WHERE x = 5;

Applying the predicate on the remote database (pushing the predicate
below the function scan) is an elementary optimization, and in many
cases would be enormously more efficient than materializing the entire
remote table at the local site and then applying the qual there.

Certainly I agree with Tom that proper SQL/MED support requires
significant support from both the executor and the optimizer. This is
just a quick hack to take advantage of the existing predicate pushdown
logic -- I just thought it was a cute trick, not something I'd suggest
we include in mainline sources.

-Neil

Re: Surfacing qualifiers

From

David Fetter

Date:

26 March 2008, 19:05:04

On Wed, Mar 26, 2008 at 02:26:41PM -0700, Neil Conway wrote:
> On Wed, 2008-03-26 at 14:38 -0400, Andrew Dunstan wrote:
> > I'm still waiting to see an example of where you say this patch is
> > even marginally useful.
> 
> It's not hard to think of one:
> 
> SELECT * FROM remote_table() WHERE x = 5;

In DBI-Link's case, remote_table() may not have all the nice features
a Postgres data store would.  It might or might not have indexes, for
example, and the best hints it could get might well be those
predicates supplied by DBI-Link.

> Applying the predicate on the remote database (pushing the predicate
> below the function scan) is an elementary optimization, and in many
> cases would be enormously more efficient than materializing the
> entire remote table at the local site and then applying the qual
> there.
> 
> Certainly I agree with Tom that proper SQL/MED support requires
> significant support from both the executor and the optimizer.
> This is just a quick hack to take advantage of the existing
> predicate pushdown logic -- I just thought it was a cute trick, not
> something I'd suggest we include in mainline sources.

I disagree that it's "just" a cute trick.  I've managed to get dblink
to use it transparently with dblink :)

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Surfacing qualifiers

From

Tom Lane

Date:

26 March 2008, 19:33:55

Neil Conway <neilc@samurai.com> writes:
> Certainly I agree with Tom that proper SQL/MED support requires
> significant support from both the executor and the optimizer. This is
> just a quick hack to take advantage of the existing predicate pushdown
> logic -- I just thought it was a cute trick, not something I'd suggest
> we include in mainline sources.

My bad, I had misread where you were pulling the quals from.  There are
still too many cases where it wouldn't work (anyplace EXPLAIN shades the
truth, including Params and subplans) but it's not quite as useless as
I first thought.

Possibly the Param issue could be fixed by inserting the current actual
value of the Param instead of listing it out as $n.

The real problem is still going to be joins, though.  You'd really
really want the thing to be capable of generating the equivalent of
a nestloop-with-inner-indexscan plan, because that is the only join type
that has any hope of not sucking down the entire content of the remote
table.  And that's not gonna happen without some amount of planner
support.

It might be interesting to think about using the planner's
get_relation_info_hook to attach pseudo indexes to RTE_FUNCTION
relations ... though where it gets the data from is not clear.
        regards, tom lane

Re: Surfacing qualifiers

From

Tino Wildenhain

Date:

28 March 2008, 07:17:34

Tom Lane wrote:
> David Fetter <david@fetter.org> writes:
>> You mentioned in an earlier mail that the information exposed was
>> inadequate.  Could you sketch out what information would really be
>> needed and where to find it?
> 
> The main problem with what you suggest is that it'll fail utterly
> on join queries.
> 
> AFAICS any real improvement in the situation will require exposing
> remote tables as a concept understood by the planner, complete
> with ways to obtain index and statistical information at plan time.
> After suitable decisions about join strategy and so forth, we'd
> wind up with a plan containing a "RemoteTableScan" node which

I'd like to point out that Remote* might be a bit to narrow because
its also a general potential for SRF functions (e.g. any virtual table
construction). Would certainly be nice if we had a as general approach
as possible.

Cheers
Tino