Thread: Exposing quals
Folks, I'd like to make it possible to see qualifiers from inside functions in PostgreSQL. For my particular case, and for dblink, it would allow user-space code to some remove bottlenecks which make currently make inter-DBMS communication extremely inefficient. The current code returns one (potentially) big string with the quals ANDed together. I'm thinking that the new code would: 1. Return just a pointer. 2. Add some functions like rsinfo_get_node_tree_str() which can turn same into (in this case, a string, but others might get tree structures) on demand from client code. 3. Add wrapper functions in each untrusted PL (trusted PLs shouldn't be doing anything that needs this). 4. Document the above. Please find attached the patch, and thanks to Neil Conway and Korry Douglas for the code, and to Jan Wieck for helping me hammer out the scheme above. Mistakes are all mine ;) Comments? Questions? Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Attachment
On Mon, 2008-05-05 at 12:01 -0700, David Fetter wrote: > Please find attached the patch, and thanks to Neil Conway and Korry > Douglas for the code, and to Jan Wieck for helping me hammer out the > scheme above. Mistakes are all mine ;) I see no negative comments to this patch on -hackers. This was discussed here http://wiki.postgresql.org/wiki/PgCon_2008_Developer_Meeting#SQL.2FMED and I had understood the consensus to be that we would go ahead with this? The notes say "Heikki doesn't think this is a long term solution", but in the following discussion it was the *only* way of doing this that will work with non-PostgreSQL databases. So it seems like the way we would want to go, yes? So, can we add this to the CommitFest July page so it can receive some substantial constructive/destructive comments? This could be an important feature in conjunction with Hot Standby. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
> > On Mon, 2008-05-05 at 12:01 -0700, David Fetter wrote: > >> Please find attached the patch, and thanks to Neil Conway and Korry >> Douglas for the code, and to Jan Wieck for helping me hammer out the >> scheme above. Mistakes are all mine ;) > > I see no negative comments to this patch on -hackers. > > This was discussed here > http://wiki.postgresql.org/wiki/PgCon_2008_Developer_Meeting#SQL.2FMED > and I had understood the consensus to be that we would go ahead with > this? > > The notes say "Heikki doesn't think this is a long term solution", but > in the following discussion it was the *only* way of doing this that > will work with non-PostgreSQL databases. So it seems like the way we > would want to go, yes? > > So, can we add this to the CommitFest July page so it can receive some > substantial constructive/destructive comments? > > This could be an important feature in conjunction with Hot Standby. The notes say at the end: "Jan thinks that showing the node tree will work better. But others don't agree with him -- it wouldn't work for PL/perlU. But Jan thinks it would work to give it a pointer to the parse tree and the range, we'd need to add an access function for the PL." For the record, I agree with Jan's suggestion of passing a pointer to the parse tree, and offline gave David a suggestion verbally as to how this could be handled for PL/PerlU. I don't think we should be tied too closely to a string representation, although possibly the first and simplest callback function would simply stringify the quals. cheers andrew
On Mon, Jul 07, 2008 at 06:46:29PM -0400, Andrew Dunstan wrote: > > > > On Mon, 2008-05-05 at 12:01 -0700, David Fetter wrote: > > > >> Please find attached the patch, and thanks to Neil Conway and > >> Korry Douglas for the code, and to Jan Wieck for helping me > >> hammer out the scheme above. Mistakes are all mine ;) > > > > I see no negative comments to this patch on -hackers. > > > > This was discussed here > > http://wiki.postgresql.org/wiki/PgCon_2008_Developer_Meeting#SQL.2FMED > > and I had understood the consensus to be that we would go ahead > > with this? > > > > The notes say "Heikki doesn't think this is a long term solution", > > but in the following discussion it was the *only* way of doing > > this that will work with non-PostgreSQL databases. So it seems > > like the way we would want to go, yes? > > > > So, can we add this to the CommitFest July page so it can receive > > some substantial constructive/destructive comments? > > > > This could be an important feature in conjunction with Hot > > Standby. > > The notes say at the end: > > "Jan thinks that showing the node tree will work better. But others > don't agree with him -- it wouldn't work for PL/perlU. But Jan > thinks it would work to give it a pointer to the parse tree and the > range, we'd need to add an access function for the PL." > > For the record, I agree with Jan's suggestion of passing a pointer > to the parse tree, and offline gave David a suggestion verbally as > to how this could be handled for PL/PerlU. > > I don't think we should be tied too closely to a string > representation, although possibly the first and simplest callback > function would simply stringify the quals. As I understand Jan's plan, the idea is to create a new relkind with an exit to user code at leaf nodes in the plan tree. This would require an API design for both user C code and for each PL to use, but would then allow PostgreSQL's optimizer to work on JOINs, etc. Jan, have I got that right so far? Do you have something in the way of a rough patch, docs, etc. for this? Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Mon, 2008-07-07 at 16:26 -0700, David Fetter wrote: > On Mon, Jul 07, 2008 at 06:46:29PM -0400, Andrew Dunstan wrote: > > > > > For the record, I agree with Jan's suggestion of passing a pointer > > to the parse tree, and offline gave David a suggestion verbally as > > to how this could be handled for PL/PerlU. > > > > I don't think we should be tied too closely to a string > > representation, although possibly the first and simplest callback > > function would simply stringify the quals. > > As I understand Jan's plan, the idea is to create a new relkind with > an exit to user code at leaf nodes in the plan tree. This would > require an API design for both user C code and for each PL to use, but > would then allow PostgreSQL's optimizer to work on JOINs, etc. > > Jan, have I got that right so far? Do you have something in the way > of a rough patch, docs, etc. for this? It sounds like we can make it happen as text for other DBMS and as plan nodes for PostgreSQL, which is the best solution all round. Personally not too worried which way we do this - as long as we do it for 8.4 :-) It's obviously happening in the background, so I'll leave it alone. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > On Mon, 2008-07-07 at 16:26 -0700, David Fetter wrote: > >> On Mon, Jul 07, 2008 at 06:46:29PM -0400, Andrew Dunstan wrote: >> >>> For the record, I agree with Jan's suggestion of passing a pointer >>> to the parse tree, and offline gave David a suggestion verbally as >>> to how this could be handled for PL/PerlU. >>> >>> I don't think we should be tied too closely to a string >>> representation, although possibly the first and simplest callback >>> function would simply stringify the quals. >>> >> As I understand Jan's plan, the idea is to create a new relkind with >> an exit to user code at leaf nodes in the plan tree. This would >> require an API design for both user C code and for each PL to use, but >> would then allow PostgreSQL's optimizer to work on JOINs, etc. >> >> Jan, have I got that right so far? Do you have something in the way >> of a rough patch, docs, etc. for this? >> > > It sounds like we can make it happen as text for other DBMS and as plan > nodes for PostgreSQL, which is the best solution all round. > > Personally not too worried which way we do this - as long as we do it > for 8.4 :-) It's obviously happening in the background, so I'll leave it > alone. > > I think the concept involving the plan tree is gold. Hannu Krosing mentioned some idea like that recently as well. If the function had the chance to tell the planner how it is gonna operate (e.g produces sorted output, etc.) it would be perfect. The golden thing here would be if we could teach a function whether it is " STREAMABLE | NOT STREAMABLE". streamable would make sure that we don't have to materialize the output of a set returning function. this would allow google-like analysis in postgresql easily by allowing to fetch data from any amount of data from any data source. best regards, hans -- Cybertec Schönig & Schönig GmbH PostgreSQL Solutions and Support Gröhrmühlgasse 26, A-2700 Wiener Neustadt Tel: +43/1/205 10 35 / 340 www.postgresql-support.de, www.postgresql-support.com
Simon Riggs wrote: > The notes say "Heikki doesn't think this is a long term solution", but > in the following discussion it was the *only* way of doing this that > will work with non-PostgreSQL databases. So it seems like the way we > would want to go, yes? How did you come to the conclusion that this is the only way that will work with non-PostgreSQL databases? I don't see any limitations like that in any of the proposed approaches. I guess I should clarify my position on this: We should start moving towards a full SQL:MED solution that will ultimately support pushing down joins, aggregates etc. to the remote database. Including support for transaction control, using 2PC, and cost estimation and intelligent planning. This should be done in an extensible way, so that people can write their own plugins to connect to different RDBMSs, as well as simple data sources like flat files. The plugin needs to be able to control which parts of a plan tree can be pushed down to the remote source, estimate the cost of remote execution, and map remote data types to local ones. And it then needs to be able to construct and execute the remote parts of a plan. We're obviously not going to get all that overnight, but whatever we implement now should be the first step towards that, rather than something that we need to deprecate and replace in the future. Unfortunately I don't see a way to extend the proposed "exposing quals to functions" patch to do more than just that. The list of functionality a full-blown plugin will need is quite long. I don't think there's any hope of supporting all that without reaching into some PostgreSQL internal data structures, particularly the planner structures like RelOptInfo, Path and Plan. The plugins will be more tightly integrated into the system than say user defined data types. They will need to be written in C, and they will be somewhat version dependent. Simpler plugins, like one to read CSV files, with no "pushing down" and no update support, will need less access to internals, and thus will be less version dependent, so pgfoundry projects like that will be feasible. Note that the dependency on internal data structures doesn't go away by saying that they're passed as text; the text representation of our data structures is version dependent as well. So what would the plugin API look like? To hook into the planner, I'm envisioning the plugin would define these functions: /* * Generate a remote plan for executing a whole subquery remotely. For * example, if the query is an aggregate, wemight be able to execute * the whole aggregate in the remote database. This will be called * from grouping_planner(),like optimize_minmax_aggregates(). * Returns NULL if remote execution is not possible. (a dummy *implementation can always return NULL. */ Plan *generate_remote_path(PlannerInfo *, List *tlist); /* * Generate a path for executing one relation in remote * database. The relation can be a base (non-join) remoterelation, * or a join involving a remote relation. Can return NULL for join * relations if the join can't be executedremotely. */ Path *generate_remote_path(PlannerInfo *, RelOptInfo *) /* * Create a Plan node from a Path. Called from create_plan, when * the planner chooses to use a remote path. A typicalimplementation * would create the SQL string to be executed in the remote database, * and return a RemotePlannode with that SQL string in it. */ Plan *create_remote_plan(PlannerInfo *, RemotePath *) On the execution side, the plugin needs to be able to execute a previously generated RemotePlan. There would be a new executor node type, a RemoteScan, that would be similar to a seq scan or index scan, but delegates the actual execution to the plugin. The execution part of the plugin API would reflect the API of executor nodes, something like: void *scan_open(RemotePlan *) HeapTuple *scan_getnext(void *scanstate) void scan_close(void *scanstate) The presumption here is that you would define remote tables with the appropriate SQL:MED statements beforehand (CREATE FOREIGN TABLE). However, it is flexible enough that you could implement the "exposing quals to functions" functionality with this as well: generate_remote_path() would need to recognize the function scans that it can handle, and return a RemotePath struct with all the same information as create_functionscan_path does (the cost estimates could be adjusted for the pushed down quals at this point as well). create_remote_plan would return a FunctionScan node, but with the extra qualifiers passed into the function as arguments. In case of dblink, it could just add extra WHERE clauses to the query that's being passed as argument. I'm not proposing that we do the stuff described in this paragraph, just using it as an example of the flexibility. BTW, I think the "exposing quals to functions" functionality could be implemented as a planner hook as well. The hook would call the standard planner, and modify the plan tree after that, passing the quals as extra arguments to functions that can take advantage of them. A "foreign data wrapper" interface is also defined in the SQL/MED standard. I've only looked at it briefly, but it seems provide roughly the same functionality as the API I defined above. It would be a good idea to look at that, though I don't think that part of the standard is very widely adopted. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, 2008-07-08 at 17:51 +0300, Heikki Linnakangas wrote: > Simon Riggs wrote: > > The notes say "Heikki doesn't think this is a long term solution", but > > in the following discussion it was the *only* way of doing this that > > will work with non-PostgreSQL databases. So it seems like the way we > > would want to go, yes? > > How did you come to the conclusion that this is the only way that will > work with non-PostgreSQL databases? SQL, in text form, is the interface to other databases. You can't pass half a plan tree to Oracle, especially not a PostgreSQL plan tree. It has to be text if you wish to send a query to another RDBMS, or another version of PostgreSQL. > We should start moving towards a full SQL:MED solution that will > ultimately support pushing down joins, aggregates etc. to the remote > database. Including support for transaction control, using 2PC, and > cost estimation and intelligent planning. > > This should be done in an extensible way, so that people can write > their own plugins to connect to different RDBMSs, as well as simple > data sources like flat files. The plugin needs to be able to control > which parts of a plan tree can be pushed down to the remote source, > estimate the cost of remote execution, and map remote data types to > local ones. And it then needs to be able to construct and execute the > remote parts of a plan. So if I understand you, you want to pass the partial plan tree and then have a plugin construct the SQL text. Sounds like a great approach. Maybe you thought I meant internal interfaces should be in text? No, that would be bizarre. I meant we should not attempt to pass partial plan trees outside of the database, since that would limit the feature to only working with the same version of PostgreSQL database. I support your wish to have something that can work with other types of database. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > SQL, in text form, is the interface to other databases. You can't pass > half a plan tree to Oracle, especially not a PostgreSQL plan tree. It > has to be text if you wish to send a query to another RDBMS, or another > version of PostgreSQL. Oh, I see. Agreed. Though note that there's big differences in SQL dialects, so a one-size-fits-all approach to generating SQL to be executed in the remote database won't work. (not that I think anyone has suggested that) > So if I understand you, you want to pass the partial plan tree and then > have a plugin construct the SQL text. Exactly. > Maybe you thought I meant internal interfaces should be in text? Yeah, that's exactly what I thought you meant. > No, > that would be bizarre. I meant we should not attempt to pass partial > plan trees outside of the database, since that would limit the feature > to only working with the same version of PostgreSQL database. Agreed. I'm glad we're on the same page now. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, Jul 08, 2008 at 06:22:23PM +0300, Heikki Linnakangas wrote: > Simon Riggs wrote: >> SQL, in text form, is the interface to other databases. You can't >> pass half a plan tree to Oracle, especially not a PostgreSQL plan >> tree. It has to be text if you wish to send a query to another >> RDBMS, or another version of PostgreSQL. > > Oh, I see. Agreed. > > Though note that there's big differences in SQL dialects, so a > one-size-fits-all approach to generating SQL to be executed in the > remote database won't work. (not that I think anyone has suggested > that) > >> So if I understand you, you want to pass the partial plan tree and >> then have a plugin construct the SQL text. > > Exactly. > >> Maybe you thought I meant internal interfaces should be in text? > > Yeah, that's exactly what I thought you meant. > >> No, that would be bizarre. I meant we should not attempt to pass >> partial plan trees outside of the database, since that would limit >> the feature to only working with the same version of PostgreSQL >> database. > > Agreed. I'm glad we're on the same page now. Everybody's weighed in on this thread except the guy who's actually doing the work. Jan? Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On 7/8/2008 11:38 AM, David Fetter wrote: > On Tue, Jul 08, 2008 at 06:22:23PM +0300, Heikki Linnakangas wrote: >> Simon Riggs wrote: >>> SQL, in text form, is the interface to other databases. You can't >>> pass half a plan tree to Oracle, especially not a PostgreSQL plan >>> tree. It has to be text if you wish to send a query to another >>> RDBMS, or another version of PostgreSQL. >> >> Oh, I see. Agreed. >> >> Though note that there's big differences in SQL dialects, so a >> one-size-fits-all approach to generating SQL to be executed in the >> remote database won't work. (not that I think anyone has suggested >> that) >> >>> So if I understand you, you want to pass the partial plan tree and >>> then have a plugin construct the SQL text. >> >> Exactly. >> >>> Maybe you thought I meant internal interfaces should be in text? >> >> Yeah, that's exactly what I thought you meant. >> >>> No, that would be bizarre. I meant we should not attempt to pass >>> partial plan trees outside of the database, since that would limit >>> the feature to only working with the same version of PostgreSQL >>> database. >> >> Agreed. I'm glad we're on the same page now. > > Everybody's weighed in on this thread except the guy who's actually > doing the work. > > Jan? Here, I talked to my supervisor here in Toronto (that's where I am this week) and Afilias actually sees enough value in this for me to go and spend time officially on it. The ideas I have so far are as follows: Down in the exec nodes like SeqScan or IndexScan, there are several parts available that are important. - Scanned relation - Targetlist - Filter (for SeqScan) - IndexQual (for IndexScan) These pieces are available at least in the scans Init function and actually can be converted back into some SQL statement that effectively represents this one single table scan. However, parsing it back at that point is nonsense, as we cannot expect everything out there to actually be an SQL database. Also, both the qualification as well as the targetlist can contain things like user defined function calls. We neither want to deny nor require that this sort of construct is actually handed over to the external data source, so the interface needs to be more flexible. Therefore it is best to divide the functionality into several user exit functions. The several functions that implement a scan type inside of the executor very much resemble opening a cursor for a single table query, fetching rows from it, eventually (in the case of a nested loop for example) close and reopen the cursor with different key values from the outer tuple, close the cursor. So it makes total sense to actually require an implementation of an external data source to provide functions to open a cursor, fetch rows, close the cursor. There will be some connection and transaction handling around all this that I have in mind but think it would distract from the problem to be solved right here, so more about that another time. The C implementation for open cursor would be called with a scan handle, containing the connection, the relname, the targetlist and the qualification subtrees. These are modified from the real ones in the scan node so that all Var's have varno=1 and that all OUTER Var's have been replaced with a Const that reflects the current outer tuples values. From here there are several support functions available to "dumb down" each of those to whatever the external data source may support. In case of the targetlist, this could mean to filter out a unique list of Var nodes only, removing all expressions from it. In case of the qualification, this could mean remove everything that isn't a standard operator (=, <>, ...), or remove everything that isn't Postgres builtin. Finally, there is a support function that will build a SQL statement according to what's left inside that scan handle. The scan handle would track which modifications have been done to the various pieces so that the outer support framework knows if it gets back the originally requested targetlist, or if it has to run the projection on the returned unique list of Var's. And if it has to recheck the returned tuples for qualification, because some of the qual's had been removed. In order to allow the user exits to be written in PL's, I can think of makiing a complex data type containing the scan handle. The subtrees could be accessed by the PL via support functions that return them in nodeToString() or other formats. I'll try to write up a more complete proposal until end of next week. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin
On Tue, Jul 08, 2008 at 02:41:45PM -0400, Jan Wieck wrote: > Here, > > I talked to my supervisor here in Toronto (that's where I am this week) > and Afilias actually sees enough value in this for me to go and spend > time officially on it. Yay! > The ideas I have so far are as follows: > > Down in the exec nodes like SeqScan or IndexScan, there are several > parts available that are important. > > - Scanned relation > - Targetlist > - Filter (for SeqScan) > - IndexQual (for IndexScan) > > These pieces are available at least in the scans Init function and > actually can be converted back into some SQL statement that effectively > represents this one single table scan. However, parsing it back at that > point is nonsense, as we cannot expect everything out there to actually > be an SQL database. > > Also, both the qualification as well as the targetlist can contain > things like user defined function calls. We neither want to deny nor > require that this sort of construct is actually handed over to the > external data source, so the interface needs to be more flexible. > Therefore it is best to divide the functionality into several user exit > functions. Right :) > The several functions that implement a scan type inside of the > executor very much resemble opening a cursor for a single table > query, fetching rows from it, eventually (in the case of a nested > loop for example) close and reopen the cursor with different key > values from the outer tuple, close the cursor. So it makes total > sense to actually require an implementation of an external data > source to provide functions to open a cursor, fetch rows, close the > cursor. That's not unreasonable, given that the simplest kind of external data source would likely be along the lines of fopen(). > There will be some connection and transaction handling around all > this that I have in mind but think it would distract from the > problem to be solved right here, so more about that another time. Maybe something like a way of setting, for each relation, what kind (if any) of transactions are available? > The C implementation for open cursor would be called with a scan > handle, containing the connection, the relname, the targetlist and > the qualification subtrees. These are modified from the real ones > in the scan node so that all Var's have varno=1 and that all OUTER > Var's have been replaced with a Const that reflects the current > outer tuples values. From here there are several support functions > available to "dumb down" each of those to whatever the external > data source may support. In case of the targetlist, this could mean > to filter out a unique list of Var nodes only, removing all > expressions from it. In case of the qualification, this could mean > remove everything that isn't a standard operator (=, <>, ...), or > remove everything that isn't Postgres builtin. Finally, there is a > support function that will build a SQL statement according to > what's left inside that scan handle. Interesting. > The scan handle would track which modifications have been done to the > various pieces so that the outer support framework knows if it gets back > the originally requested targetlist, or if it has to run the projection > on the returned unique list of Var's. And if it has to recheck the > returned tuples for qualification, because some of the qual's had been > removed. > > In order to allow the user exits to be written in PL's, I can think of > makiing a complex data type containing the scan handle. The subtrees > could be accessed by the PL via support functions that return them in > nodeToString() or other formats. > > I'll try to write up a more complete proposal until end of next week. Yay! Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate