Getting sorted data from foreign server - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Getting sorted data from foreign server
Date
Msg-id CAFjFpRd4kdRuP3opeGyzzeUygkZyY-JUCg45u7itgW0Nu-0M_A@mail.gmail.com
Whole thread Raw
Responses Re: Getting sorted data from foreign server  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi All,
standard_qp_callback() sets root->query_pathkeys to pathkeys on which the result of query_planner are expected be sorted upon (see the function for more details). The patch checks if any prefix of these pathkeys can be entirely evaluated using the foreign relation and at the foreign server under consideration. If yes, it gets estimates of costs involved and adds paths with those pathkeys. There can be multiple pathkeyless paths added for a given base relation. For every such path one path with pathkeys is added. If there is an index matching on the foreign server, getting the data sorted from foreign server improves execution time as seen from the results. The patch adds this functionality entirely in postgres_fdw.

For a postgres_fdw foreign table ft1(val int, val2 int), with the patch

EXPLAIN VERBOSE SELECT * FROM ft1 ORDER BY val; gives
                               QUERY PLAN   
------------------------------------------------------------------------
 Foreign Scan on public.ft1  (cost=100.29..6480.42 rows=100118 width=8)
   Output: val, val2
   Remote SQL: SELECT val, val2 FROM public.lt ORDER BY val ASC
(3 rows)

observe that the query sent to the foreign server has ORDER BY clause in it. The test script attached has more examples of the same. The patch adds a small test case.

Results
------------
Attached find the script used to measure the performance. The script creates a foreign server and foreign table pointing to the local server and local table resp. The test runs three different types of queries (simple sort, group by, sorted result from inheritance hierarchy) multiple times and calculates the average execution time for each query with and without the patch. The performance is measured for foreign table (ft1 above) populated with 100 rows (in-memory sorting) and with 100000 rows (external sorting) resp. The output of the script with and without patch and with different sizes of foreign table is attached here.

We can observe following
1. For large number of rows (when the memory is not enough to hold all the data to be sorted) we see 20-25% reduction in the query execution time when there is matching index on the foreign server.

2. For very small number of rows (when the memory is enough to hold all the data to be sorted) there is not much performance gain and sometimes the planner is not choosing the path with pathkeys for foreign scans.

3. In all the cases, the planning time increases owing to EXPLAIN queries fired on the foreign server.


--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Attachment

pgsql-hackers by date:

Previous
From: "Syed, Rahila"
Date:
Subject: Re: [PROPOSAL] VACUUM Progress Checker.
Next
From: "Syed, Rahila"
Date:
Subject: Re: [PROPOSAL] VACUUM Progress Checker.