Re: Exposing quals - Mailing list pgsql-hackers

From Jan Wieck
Subject Re: Exposing quals
Date
Msg-id 4873B4E9.4000203@Yahoo.com
Whole thread Raw
In response to Re: Exposing quals  (David Fetter <david@fetter.org>)
Responses Re: Exposing quals  (David Fetter <david@fetter.org>)
List pgsql-hackers
On 7/8/2008 11:38 AM, David Fetter wrote:
> On Tue, Jul 08, 2008 at 06:22:23PM +0300, Heikki Linnakangas wrote:
>> Simon Riggs wrote:
>>> SQL, in text form, is the interface to other databases. You can't
>>> pass half a plan tree to Oracle, especially not a PostgreSQL plan
>>> tree. It has to be text if you wish to send a query to another
>>> RDBMS, or another version of PostgreSQL.
>>
>> Oh, I see. Agreed.
>>
>> Though note that there's big differences in SQL dialects, so a
>> one-size-fits-all approach to generating SQL to be executed in the
>> remote database won't work. (not that I think anyone has suggested
>> that)
>>
>>> So if I understand you, you want to pass the partial plan tree and
>>> then have a plugin construct the SQL text.
>>
>> Exactly.
>>
>>> Maybe you thought I meant internal interfaces should be in text?
>>
>> Yeah, that's exactly what I thought you meant.
>>
>>> No, that would be bizarre. I meant we should not attempt to pass
>>> partial plan trees outside of the database, since that would limit
>>> the feature to only working with the same version of PostgreSQL
>>> database. 
>>
>> Agreed. I'm glad we're on the same page now.
> 
> Everybody's weighed in on this thread except the guy who's actually
> doing the work.
> 
> Jan?

Here,

I talked to my supervisor here in Toronto (that's where I am this week) 
and Afilias actually sees enough value in this for me to go and spend 
time officially on it.

The ideas I have so far are as follows:

Down in the exec nodes like SeqScan or IndexScan, there are several 
parts available that are important.
    - Scanned relation    - Targetlist    - Filter (for SeqScan)    - IndexQual (for IndexScan)

These pieces are available at least in the scans Init function and 
actually can be converted back into some SQL statement that effectively 
represents this one single table scan. However, parsing it back at that 
point is nonsense, as we cannot expect everything out there to actually 
be an SQL database.

Also, both the qualification as well as the targetlist can contain 
things like user defined function calls. We neither want to deny nor 
require that this sort of construct is actually handed over to the 
external data source, so the interface needs to be more flexible. 
Therefore it is best to divide the functionality into several user exit 
functions.

The several functions that implement a scan type inside of the executor 
very much resemble opening a cursor for a single table query, fetching 
rows from it, eventually (in the case of a nested loop for example) 
close and reopen the cursor with different key values from the outer 
tuple, close the cursor. So it makes total sense to actually require an 
implementation of an external data source to provide functions to open a 
cursor, fetch rows, close the cursor.

There will be some connection and transaction handling around all this 
that I have in mind but think it would distract from the problem to be 
solved right here, so more about that another time.

The C implementation for open cursor would be called with a scan handle, 
containing the connection, the relname, the targetlist and the 
qualification subtrees. These are modified from the real ones in the 
scan node so that all Var's have varno=1 and that all OUTER Var's have 
been replaced with a Const that reflects the current outer tuples 
values. From here there are several support functions available to "dumb 
down" each of those to whatever the external data source may support. In 
case of the targetlist, this could mean to filter out a unique list of 
Var nodes only, removing all expressions from it. In case of the 
qualification, this could mean remove everything that isn't a standard 
operator (=, <>, ...), or remove everything that isn't Postgres builtin. 
Finally, there is a support function that will build a SQL statement 
according to what's left inside that scan handle.

The scan handle would track which modifications have been done to the 
various pieces so that the outer support framework knows if it gets back 
the originally requested targetlist, or if it has to run the projection 
on the returned unique list of Var's. And if it has to recheck the 
returned tuples for qualification, because some of the qual's had been 
removed.

In order to allow the user exits to be written in PL's, I can think of 
makiing a complex data type containing the scan handle. The subtrees 
could be accessed by the PL via support functions that return them in 
nodeToString() or other formats.

I'll try to write up a more complete proposal until end of next week.


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Identifier case folding notes
Next
From: Teodor Sigaev
Date:
Subject: Re: [PATCHES] GIN improvements