Thread: Surfacing qualifiers
Folks, Neil Conway sent me a patch that sketched out a plan to make quals visible to functions, and Korry Douglas filled in much of the rest of what you see attached here. Mistakes are all mine. :) Random observations: * It appears I've botched the call to deparse_context_for_plan in src/backend/executor/execQual.c and/or made some more fundamental error because when testing with PL/Perl, I only see quals inside the function when there is just one. Where are the APIs for things like deparse_context_for_plan() documented? * This stuff should be available to all the PLs. How might it work in PL/PgSQL, etc.? * The patch just appends a WHERE clause on when it finds quals in contrib/dblink/dblink.c. While this is probably a good place to start, it might be good to explore some kind of approach that allows more subtlety. * In PL/Perl, $_TD->{_quals} gets the qualifiers, but they really should go in their own variable. I'm thinking that one should be called $_QUAL. * More generally, it would be nice to have the quals in some kind of data structure so the calling code could do smarter things with them than the current string-mashing the example code does. Help, comments, brickbats, etc. appreciated :) Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Attachment
David Fetter wrote: > Folks, > > Neil Conway sent me a patch that sketched out a plan to make quals > visible to functions > er, what? Please explain what this means, why it might be useful. Example(s) would help. > * In PL/Perl, $_TD->{_quals} gets the qualifiers, but they really > should go in their own variable. I'm thinking that one should be > called $_QUAL. > > $_TD is only there for triggers. Is this a trigger feature? cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > David Fetter wrote: >> Neil Conway sent me a patch that sketched out a plan to make quals >> visible to functions > er, what? Please explain what this means, why it might be useful. It's utterly useless, because it only exposes a small fraction of the information that would be needed to do anything. regards, tom lane
On Wed, Mar 26, 2008 at 08:31:04AM -0400, Andrew Dunstan wrote: > David Fetter wrote: >> Folks, >> >> Neil Conway sent me a patch that sketched out a plan to make quals >> visible to functions >> > er, what? Please explain what this means, why it might be useful. > Example(s) would help. Right now, user-callable code has no way to access the predicates of the query which calls it. Neil's patch was intended to expose some of them. The biggest use cases I see are for inter-database communication such as dblink and friends. The dblink part of the patch shows what's supposed to be happening. What happens now with dblink is that the remote table (more generally, the output of a fixed query) gets materialized into memory in its entirety, and if it's bigger than what's available, it will crash the backend or worse. That happens because functions do not have any access to the predicates with which they were called, so the current workaround is to pass the predicates manually and then cast. Speaking of casting, how hard would it be to make something like this work? SELECT foo, bar, baz FROM function_returning_setof_record('parameters','here') AS f(compound_type); >> * In PL/Perl, $_TD->{_quals} gets the qualifiers, but they really >> should go in their own variable. I'm thinking that one should be >> called $_QUAL. > > $_TD is only there for triggers. Is this a trigger feature? Clearly it is not, but the procedure for adding a new variable which needs to be available to functions is far from clear, so Korry decided to wedge it into that existing variable to get a proof of concept out the door. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter <david@fetter.org> writes: > What happens now with dblink is that the remote table (more generally, > the output of a fixed query) gets materialized into memory in its > entirety, and if it's bigger than what's available, it will crash the > backend or worse. This is utter nonsense. It gets put into a tuplestore which is entirely capable of spilling to disk. Slow, yes, but crashing is a lie. > That happens because functions do not have any > access to the predicates with which they were called, so the current > workaround is to pass the predicates manually and then cast. dblink is not a suitable framework for improving that situation. Maybe someday we'll have a proper implementation of SQL/MED ... regards, tom lane
On Wed, Mar 26, 2008 at 01:21:14PM -0400, Tom Lane wrote: > David Fetter <david@fetter.org> writes: > > What happens now with dblink is that the remote table (more generally, > > the output of a fixed query) gets materialized into memory in its > > entirety, and if it's bigger than what's available, it will crash the > > backend or worse. > > This is utter nonsense. It gets put into a tuplestore which is entirely > capable of spilling to disk. Slow, yes, but crashing is a lie. > > > That happens because functions do not have any > > access to the predicates with which they were called, so the current > > workaround is to pass the predicates manually and then cast. > > dblink is not a suitable framework for improving that situation. > Maybe someday we'll have a proper implementation of SQL/MED ... This is intended to be a step or two along the way :) You mentioned in an earlier mail that the information exposed was inadequate. Could you sketch out what information would really be needed and where to find it? Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter <david@fetter.org> writes: > You mentioned in an earlier mail that the information exposed was > inadequate. Could you sketch out what information would really be > needed and where to find it? The main problem with what you suggest is that it'll fail utterly on join queries. AFAICS any real improvement in the situation will require exposing remote tables as a concept understood by the planner, complete with ways to obtain index and statistical information at plan time. After suitable decisions about join strategy and so forth, we'd wind up with a plan containing a "RemoteTableScan" node which would have some restriction conditions attached. Then forwarding those to the remote database would be sensible. But expecting a dblink function to figure out which clauses are restrictions for its table, when it doesn't even know what other tables were in the query, is not sensible. regards, tom lane
David Fetter wrote: >> >> dblink is not a suitable framework for improving that situation. >> Maybe someday we'll have a proper implementation of SQL/MED ... >> > > This is intended to be a step or two along the way :) > > > I'm still waiting to see an example of where you say this patch is even marginally useful. cheers andrew
On Wed, 2008-03-26 at 14:38 -0400, Andrew Dunstan wrote: > I'm still waiting to see an example of where you say this patch is even > marginally useful. It's not hard to think of one: SELECT * FROM remote_table() WHERE x = 5; Applying the predicate on the remote database (pushing the predicate below the function scan) is an elementary optimization, and in many cases would be enormously more efficient than materializing the entire remote table at the local site and then applying the qual there. Certainly I agree with Tom that proper SQL/MED support requires significant support from both the executor and the optimizer. This is just a quick hack to take advantage of the existing predicate pushdown logic -- I just thought it was a cute trick, not something I'd suggest we include in mainline sources. -Neil
On Wed, Mar 26, 2008 at 02:26:41PM -0700, Neil Conway wrote: > On Wed, 2008-03-26 at 14:38 -0400, Andrew Dunstan wrote: > > I'm still waiting to see an example of where you say this patch is > > even marginally useful. > > It's not hard to think of one: > > SELECT * FROM remote_table() WHERE x = 5; In DBI-Link's case, remote_table() may not have all the nice features a Postgres data store would. It might or might not have indexes, for example, and the best hints it could get might well be those predicates supplied by DBI-Link. > Applying the predicate on the remote database (pushing the predicate > below the function scan) is an elementary optimization, and in many > cases would be enormously more efficient than materializing the > entire remote table at the local site and then applying the qual > there. > > Certainly I agree with Tom that proper SQL/MED support requires > significant support from both the executor and the optimizer. > This is just a quick hack to take advantage of the existing > predicate pushdown logic -- I just thought it was a cute trick, not > something I'd suggest we include in mainline sources. I disagree that it's "just" a cute trick. I've managed to get dblink to use it transparently with dblink :) Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Neil Conway <neilc@samurai.com> writes: > Certainly I agree with Tom that proper SQL/MED support requires > significant support from both the executor and the optimizer. This is > just a quick hack to take advantage of the existing predicate pushdown > logic -- I just thought it was a cute trick, not something I'd suggest > we include in mainline sources. My bad, I had misread where you were pulling the quals from. There are still too many cases where it wouldn't work (anyplace EXPLAIN shades the truth, including Params and subplans) but it's not quite as useless as I first thought. Possibly the Param issue could be fixed by inserting the current actual value of the Param instead of listing it out as $n. The real problem is still going to be joins, though. You'd really really want the thing to be capable of generating the equivalent of a nestloop-with-inner-indexscan plan, because that is the only join type that has any hope of not sucking down the entire content of the remote table. And that's not gonna happen without some amount of planner support. It might be interesting to think about using the planner's get_relation_info_hook to attach pseudo indexes to RTE_FUNCTION relations ... though where it gets the data from is not clear. regards, tom lane
Tom Lane wrote: > David Fetter <david@fetter.org> writes: >> You mentioned in an earlier mail that the information exposed was >> inadequate. Could you sketch out what information would really be >> needed and where to find it? > > The main problem with what you suggest is that it'll fail utterly > on join queries. > > AFAICS any real improvement in the situation will require exposing > remote tables as a concept understood by the planner, complete > with ways to obtain index and statistical information at plan time. > After suitable decisions about join strategy and so forth, we'd > wind up with a plan containing a "RemoteTableScan" node which I'd like to point out that Remote* might be a bit to narrow because its also a general potential for SRF functions (e.g. any virtual table construction). Would certainly be nice if we had a as general approach as possible. Cheers Tino