Re: foreign data wrappers - Mailing list pgsql-students

From Zheng Yang
Subject Re: foreign data wrappers
Date
Msg-id E1C78427-1287-46D8-8E66-8DC56E234F9A@gmail.com
Whole thread Raw
In response to Re: foreign data wrappers  (Andrew Dunstan <andrew@dunslane.net>)
List pgsql-students

Hi Andrew and Guillaume,



On 03/29/2011 12:35 PM, Guillaume Lelarge wrote:
Le 29/03/2011 18:32, Andrew Dunstan a écrit :

On 03/29/2011 11:48 AM, Guillaume Lelarge wrote:
Le 29/03/2011 13:28, Zheng Yang a écrit :
Hi guys,

I've briefly gone through the slides. Regarding the 6 callbacks, is
that correct to say that a full table scan will always be performed
irregardless of the sql statement,
the FDW is blind to the sql query performed, right?
Yes, fairly much. If the feed is large you need some way to pass a
limit to the foreign side, possibly via table options. I'm fairly
sure you won't be able to get it via the SELECT statement.

Regarding the previous flickr example, I'm wondering how this 'free
text search' function can be done if the FDW is blind to the SELECT
statement.

For instance, the following query is to retrieve a photo relevant to
'panda':

    SELECT photo FROM flickr_table WHERE search LIKE '%panda%';

In this case, the FDW can only open a connection to flickr web
service and return the next 'row' .
The problem is that there are a huge number of photos in flickr
server and retrieving them sequentially is not realistic.
Any ideas on how this can be done?

It probably means that flickr is not a good example of a nice fdw.

Neither of you are being very creative. As I mentioned above, you need
to embed this sort of stuff in table options.

so you would have something like:

   create foreign table panda_flickr (photo bytea, ...)
        server flickr_server
        options (searchterm 'panda', maxrows '50');
   select photo from panda_flickr;

This would work but means you need to create a new foreign table to
search something else.

So, yeah, it works, but it's not convenient.

The other possibility is that you can dig down into the ForiegnScanState object. The FDW routines are passed a ForeignScanState object which contains a ScanState object which in turn contains a PlanState object which has a list of quals. You probably need to dig quite a bit further but that's a start.

I think this is a common issue for all FDWs that need to access remote resources over a network. For example, if there were a Mysql FDW, a full table scan implies the whole table will be transferred over. 
it is not quite efficient for large tables. 

If a table size is 1GB, iterating the whole table row by row means those 1GB of data needs to be transferred over. This may take hours even if for an sql statement as simple as  
SELECT * from table where id = 1;

cheers,
ZY




pgsql-students by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: foreign data wrappers
Next
From: Tomáš Pospíšil
Date:
Subject: GSoC 2011 - indexing XML data