Re: PG9.2 and FDW query planning. - Mailing list pgsql-hackers
From | Ronan Dunklau |
---|---|
Subject | Re: PG9.2 and FDW query planning. |
Date | |
Msg-id | 4FFECDAA.80107@gmail.com Whole thread Raw |
In response to | Re: PG9.2 and FDW query planning. (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 05/07/2012 18:30, Tom Lane wrote: > Ronan Dunklau <rdunklau@gmail.com> writes: >> Let's say I have an IMAP foreign data wrapper, and I write a >> query joining the table on itself using the In-Reply-To and >> Message-ID headers, is there anything I can do to avoid fetching >> all the mails from the remote server ? > >> If I could somehow inform the planner that it can look up rows >> by message-id, thus avoiding the need to fetch everything from >> the remote server. Perhaps "persuading" the planner to use a >> nested-loop ? > > OK, so what you're saying is that the imap server can effectively > provide an index on message_id. What you'd do is create a > parameterized path that uses the tbl.message_id = > other_tbl.in_reply_to join clause. If that's enough cheaper than a > full scan, the planner would select it. Thank you, I was able to build such paths from your indication. The python FDW implementor can optionally give a list of tuples consisting of (path key, expected_number_of_row). So, in the imap example that could be [('Message-ID', 1), ('From', 1000)] for example. - From this information, if there is an equivalence class which restrictinfo uses one of those keys, we build a parameterized path, with an associated cost of base_width * expected_number_of_row, in addition to the generic, unparameterized path. The planner can then select this path, and build plans looking like this: postgres=# explain select m1."From", m1."To", m2."From", m2."To" from mails m1 inner join mails m2 on m2."Message-ID" = m1."In-Reply-To" where m1."From" = '%test@example.com%'; QUERY PLAN - --------------------------------------------------------------------Nested Loop (cost=10.00..60001000.00 rows=500000000width=128) -> Foreign Scan on mails m1 (cost=0.00..30000000.00 rows=100000 width=300) Filter: (("From")::text = '%test@example.com%'::text) -> Foreign Scan on mails m2 (cost=10.00..300.00rows=1 width=300) Filter: (("Message-ID")::text = (m1."In-Reply-To")::text) If I understand it correctly, after returning a ForeignScan (from GetForeignPlan), the planner decides to use a nestloop, and in the process of creating the nestloop plan, replaces Var nodes coming from the outerel (here, m1."In-Reply-To") by params nodes. My current implementation already looks for (var = param) expressions that it may handle during the plan phase and stores the association between the var and the param_id. At execution time, the needed parameters values are fetched (from the ParamExecData array found in es_param_exec_vals) and passed to the python foreign data wrapper. The problem I have: how can I retrieve the generated params and keep the association between the var and the param ? Should I replace the (var = outervar) clauses by (var = param) myself and store them in the fdw_exprs field of the foreign scan ? > FWIW, I'm not sure that it's sane to try to expose this stuff to > python yet. It's complicated and still something of a moving > target. Once we've got a few more C-coded FDWs that can do this > type of optimization, things might be stable enough that it'd be > useful to try to provide a high-level API. The current API (as mentioned above) would be more declarative than anything, only offering a way to (maybe) build parameterized paths without guaranteeing anything. Even if the internals change, I fail to see how it can hurt to offer such a feature. Regards, - -- Ronan Dunklau -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQEcBAEBAgAGBQJP/s2dAAoJECTYLCgFy3239KkIAIiKJo/F1r4Yp49wLpmThjQI ICo910ZajqlUKVsl9ye8m2l6p+lyGEmZMWUAWP6ae2pqFR+aC0zThypjF1faZ9tN HfqMbEKx/trkDf05U28tJlvOeu21tiEOEs4n02fmfdHu9SvemuLdyhU3dOLxoBVK ZZ8ra9q/+zHCPpc3zt0Mow80Q1X1M3DtirsHPoeIdOK69wD4nD2ZfhQule5HaoV1 dG3FlrKGAGzRpohLBCuWzyGPcWCS584lXGWfhsz/waLaSDIjcjvaaMke54eaa8Ci 7KxXkMM12CKFQyheSR5VVwFJrobnME2HDiJCoAOkRc0dW+Y2aASJnKG/FwL8C7s= =4RjB -----END PGP SIGNATURE-----
pgsql-hackers by date: