-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 05/07/2012 18:30, Tom Lane wrote:
> Ronan Dunklau <rdunklau@gmail.com> writes:
>> Let's say I have an IMAP foreign data wrapper, and I write a
>> query joining the table on itself using the In-Reply-To and
>> Message-ID headers, is there anything I can do to avoid fetching
>> all the mails from the remote server ?
>
>> If I could somehow inform the planner that it can look up rows
>> by message-id, thus avoiding the need to fetch everything from
>> the remote server. Perhaps "persuading" the planner to use a
>> nested-loop ?
>
> OK, so what you're saying is that the imap server can effectively
> provide an index on message_id. What you'd do is create a
> parameterized path that uses the tbl.message_id =
> other_tbl.in_reply_to join clause. If that's enough cheaper than a
> full scan, the planner would select it.
Thank you, I was able to build such paths from your indication.
The python FDW implementor can optionally give a list of tuples
consisting of (path key, expected_number_of_row). So, in the imap
example that could be [('Message-ID', 1), ('From', 1000)] for example.
- From this information, if there is an equivalence class which
restrictinfo uses one of those keys, we build a parameterized path,
with an associated cost of base_width * expected_number_of_row, in
addition to the generic, unparameterized path.
The planner can then select this path, and build plans looking like this:
postgres=# explain select m1."From", m1."To", m2."From", m2."To"
from mails m1 inner join mails m2 on m2."Message-ID" = m1."In-Reply-To"
where m1."From" = '%test@example.com%';
QUERY PLAN
- --------------------------------------------------------------------Nested Loop (cost=10.00..60001000.00
rows=500000000width=128) -> Foreign Scan on mails m1 (cost=0.00..30000000.00 rows=100000
width=300) Filter: (("From")::text = '%test@example.com%'::text) -> Foreign Scan on mails m2
(cost=10.00..300.00rows=1 width=300) Filter: (("Message-ID")::text = (m1."In-Reply-To")::text)
If I understand it correctly, after returning a ForeignScan (from
GetForeignPlan), the planner decides to use a nestloop, and in the
process of creating the nestloop plan, replaces Var nodes coming from
the outerel (here, m1."In-Reply-To") by params nodes.
My current implementation already looks for (var = param) expressions
that it may handle during the plan phase and stores the association
between the var and the param_id.
At execution time, the needed parameters values are fetched (from the
ParamExecData array found in es_param_exec_vals) and passed to the
python foreign data wrapper.
The problem I have: how can I retrieve the generated params and keep
the association between the var and the param ?
Should I replace the (var = outervar) clauses by (var = param) myself
and store them in the fdw_exprs field of the foreign scan ?
> FWIW, I'm not sure that it's sane to try to expose this stuff to
> python yet. It's complicated and still something of a moving
> target. Once we've got a few more C-coded FDWs that can do this
> type of optimization, things might be stable enough that it'd be
> useful to try to provide a high-level API.
The current API (as mentioned above) would be more declarative than
anything, only offering a way to (maybe) build parameterized paths
without guaranteeing anything. Even if the internals change, I fail to
see how it can hurt to offer such a feature.
Regards,
- --
Ronan Dunklau
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQEcBAEBAgAGBQJP/s2dAAoJECTYLCgFy3239KkIAIiKJo/F1r4Yp49wLpmThjQI
ICo910ZajqlUKVsl9ye8m2l6p+lyGEmZMWUAWP6ae2pqFR+aC0zThypjF1faZ9tN
HfqMbEKx/trkDf05U28tJlvOeu21tiEOEs4n02fmfdHu9SvemuLdyhU3dOLxoBVK
ZZ8ra9q/+zHCPpc3zt0Mow80Q1X1M3DtirsHPoeIdOK69wD4nD2ZfhQule5HaoV1
dG3FlrKGAGzRpohLBCuWzyGPcWCS584lXGWfhsz/waLaSDIjcjvaaMke54eaa8Ci
7KxXkMM12CKFQyheSR5VVwFJrobnME2HDiJCoAOkRc0dW+Y2aASJnKG/FwL8C7s=
=4RjB
-----END PGP SIGNATURE-----