Re: Getting sorted data from foreign server - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Getting sorted data from foreign server |
Date | |
Msg-id | CA+TgmoYfH9jETDL-sxKPS+YCPDTSOth20_rOYVHFhc9xctRbWQ@mail.gmail.com Whole thread Raw |
In response to | Getting sorted data from foreign server (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>) |
Responses |
Re: Getting sorted data from foreign server
(Jeremy Harris <jgh@wizmail.org>)
Re: Getting sorted data from foreign server (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>) |
List | pgsql-hackers |
On Tue, Oct 6, 2015 at 6:46 AM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > standard_qp_callback() sets root->query_pathkeys to pathkeys on which the > result of query_planner are expected be sorted upon (see the function for > more details). The patch checks if any prefix of these pathkeys can be > entirely evaluated using the foreign relation and at the foreign server > under consideration. If yes, it gets estimates of costs involved and adds > paths with those pathkeys. There can be multiple pathkeyless paths added for > a given base relation. For every such path one path with pathkeys is added. > If there is an index matching on the foreign server, getting the data sorted > from foreign server improves execution time as seen from the results. The > patch adds this functionality entirely in postgres_fdw. In the interest of full disclosure, I asked Ashutosh to work on this patch and have discussed the design with him several times. I believe that this is a good direction for PostgreSQL to be going. It's trivially easy right now to write a query against an FDW that performs needlessly easy, because a join, or a sort, or an aggregate is performed on the local server rather than the remote one. I, and EnterpriseDB, want that to get fixed. However, we also want it to get fixed in the best possible way, and not to do anything unless there is consensus on it. So, if anyone has opinions on this topic, please jump in. That having been said, here are some review comments on this patch: - You consider pushing down ORDER BY if any prefix of the query pathkeys satisfy is_foreign_expr(), but that doesn't seem right to me. If the user says SELECT * FROM remotetab ORDER BY a, unsafe(a), ordering the result set by a doesn't help us much. We've talked a few times about an incremental sort capability that would take a stream of tuples already ordered by one or more columns and sort each group by additional columns, but I don't think we have that currently. Without that capability, I don't think there's much benefit in sorting by a prefix of the pathkeys. I suspect that if we can't get them all, it's not worth doing. - Right now, you have this code below the point where we bail out if use_remote_estimate is not set. If we keep it like that, the comment needs updating. But I suggest that we consider an ordered path even if we are not using remote estimates. Estimate the sorted path to cost somewhat more than the unsorted path, so that we only choose that path if the sort actually benefits us. I don't know exactly how to come up with a principled estimate given that we don't actually know whether the remote side will need an extra sort or not, but maybe a dumb estimate is still better than not trying a sorted path. - In the long run, we should probably either add some configuration so that the local side can make better estimates even without use_remote_estimate, or maybe have a way for the FDW to keep a cache of data in the system catalogs that is updated by ANALYZE. Then, analyzing a foreign table could store information locally about indexes and so forth, instead of starting each query planning cycle with no knowledge about the remote side. That's not a matter for this patch, I don't think, but it seems like something we should do. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: