Re: PG9.2 and FDW query planning. - Mailing list pgsql-hackers

From Ronan Dunklau
Subject Re: PG9.2 and FDW query planning.
Date
Msg-id 4FFECDAA.80107@gmail.com
Whole thread Raw
In response to Re: PG9.2 and FDW query planning.  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/07/2012 18:30, Tom Lane wrote:
> Ronan Dunklau <rdunklau@gmail.com> writes:
>> Let's say I have an IMAP foreign data wrapper, and I write a
>> query joining the table on itself using the In-Reply-To and
>> Message-ID headers, is there anything I can do to avoid fetching
>> all the mails from the remote server ?
> 
>> If I could somehow inform the planner that it can look up rows
>> by message-id, thus avoiding the need to fetch everything from
>> the remote server. Perhaps "persuading" the planner to use a
>> nested-loop ?
> 
> OK, so what you're saying is that the imap server can effectively 
> provide an index on message_id.  What you'd do is create a
> parameterized path that uses the tbl.message_id =
> other_tbl.in_reply_to join clause. If that's enough cheaper than a
> full scan, the planner would select it.

Thank you, I was able to build such paths from your indication.

The python FDW implementor can optionally give a list of tuples
consisting of (path key, expected_number_of_row). So, in the imap
example that could be [('Message-ID', 1), ('From', 1000)] for example.

- From this information, if there is an equivalence class which
restrictinfo uses one of those keys, we build a parameterized path,
with an associated cost of base_width * expected_number_of_row, in
addition to the generic, unparameterized path.

The planner can then select this path, and build plans looking like this:

postgres=# explain select m1."From",      m1."To",      m2."From",      m2."To"
from mails m1 inner join mails m2 on m2."Message-ID" = m1."In-Reply-To"
where m1."From" = '%test@example.com%';
                       QUERY PLAN
- --------------------------------------------------------------------Nested Loop  (cost=10.00..60001000.00
rows=500000000width=128)  ->  Foreign Scan on mails m1  (cost=0.00..30000000.00 rows=100000
 
width=300)        Filter: (("From")::text = '%test@example.com%'::text)  ->  Foreign Scan on mails m2
(cost=10.00..300.00rows=1 width=300)        Filter: (("Message-ID")::text = (m1."In-Reply-To")::text)
 


If I understand it correctly, after returning a ForeignScan (from
GetForeignPlan), the planner decides to use a nestloop, and in the
process of creating the nestloop plan, replaces Var nodes coming from
the outerel (here, m1."In-Reply-To") by params nodes.

My current implementation already looks for (var = param) expressions
that it may handle  during the plan phase and stores the association
between the var and the param_id.
At execution time, the needed parameters values are fetched (from the
ParamExecData array found in es_param_exec_vals) and passed to the
python foreign data wrapper.

The problem I have: how can I retrieve the generated params and keep
the association between the var and the param ?

Should I replace the (var = outervar) clauses by (var = param) myself
and store them in the fdw_exprs field of the foreign scan ?

> FWIW, I'm not sure that it's sane to try to expose this stuff to
> python yet.  It's complicated and still something of a moving
> target.  Once we've got a few more C-coded FDWs that can do this
> type of optimization, things might be stable enough that it'd be
> useful to try to provide a high-level API.

The current API (as mentioned above) would be more declarative than
anything, only offering a way to (maybe) build parameterized paths
without guaranteeing anything. Even if the internals change, I fail to
see how it can hurt to offer such a feature.


Regards,

- -- 
Ronan Dunklau


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)

iQEcBAEBAgAGBQJP/s2dAAoJECTYLCgFy3239KkIAIiKJo/F1r4Yp49wLpmThjQI
ICo910ZajqlUKVsl9ye8m2l6p+lyGEmZMWUAWP6ae2pqFR+aC0zThypjF1faZ9tN
HfqMbEKx/trkDf05U28tJlvOeu21tiEOEs4n02fmfdHu9SvemuLdyhU3dOLxoBVK
ZZ8ra9q/+zHCPpc3zt0Mow80Q1X1M3DtirsHPoeIdOK69wD4nD2ZfhQule5HaoV1
dG3FlrKGAGzRpohLBCuWzyGPcWCS584lXGWfhsz/waLaSDIjcjvaaMke54eaa8Ci
7KxXkMM12CKFQyheSR5VVwFJrobnME2HDiJCoAOkRc0dW+Y2aASJnKG/FwL8C7s=
=4RjB
-----END PGP SIGNATURE-----


pgsql-hackers by date:

Previous
From: Dimitri Fontaine
Date:
Subject: Re: Synchronous Standalone Master Redoux
Next
From: Shaun Thomas
Date:
Subject: Re: Synchronous Standalone Master Redoux