Re: Getting sorted data from foreign server for merge join - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Getting sorted data from foreign server for merge join
Date
Msg-id CA+TgmoaWP05mn7Eo1EHnYn+qgJc_tSp-w_+O4O7707R+uTrV=w@mail.gmail.com
Whole thread Raw
In response to Re: Getting sorted data from foreign server for merge join  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Responses Re: Getting sorted data from foreign server for merge join  (Rushabh Lathia <rushabh.lathia@gmail.com>)
Re: Getting sorted data from foreign server for merge join  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
List pgsql-hackers
On Thu, Dec 17, 2015 at 3:32 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Wed, Dec 9, 2015 at 12:14 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Dec 2, 2015 at 6:45 AM, Rushabh Lathia <rushabh.lathia@gmail.com>
>> wrote:
>> > Thanks Ashutosh.
>> >
>> > Re-reviewed and Re-verified the patch, pg_sort_all_pd_v5.patch
>> > looks good to me.
>>
>> This patch needs a rebase.
>
> Done.

Thanks.

>> It's not going to work to say this is a patch proposed for commit when
>> it's still got a TODO comment in it that obviously needs to be
>> changed.   And the formatting of that long comment is pretty weird,
>> too, and not consistent with other functions in that same file (e.g.
>> get_remote_estimate, ec_member_matches_foreign, create_cursor).
>>
>
> The TODO was present in v4 but not in v5 and is not present in v6 attached
> here.. Formatted comment according estimate_path_cost_size(),
> convert_prep_stmt_params().

Hrm, I must have been looking at the wrong version somehow.  Sorry about that.

>> Aside from that, I think before we commit this, somebody should do
>> some testing that demonstrates that this is actually a good idea.  Not
>> as part of the test case set for this patch, but just in general.
>> Merge joins are typically going to be relevant for large tables, but
>> the examples in the regression tests are necessarily tiny.  I'd like
>> to see some sample data and some sample queries that get appreciably
>> faster with this code.  If we can't find any, we don't need the code.
>>
>
> I tested the patch on my laptop with two types of queries, a join between
> two foreign tables on different foreign servers (pointing to the same self
> server) and a join between one foreign and one local table. The foreign
> tables and servers are created using sort_pd_setup.sql attached. Foreign
> tables pointed to table with index useful for join clause. Both the joining
> tables had 10M rows. The execution time of query was measured for 100 runs
> and average and standard deviation were calculated (using function
> query_execution_stats() in script sort_pd.sql) and are presented below.

OK, cool.

I went over this patch in some detail today and did a lot of cosmetic
cleanup.  The results are attached.  I'm fairly happy with this
version, but let me know what you think.  Of course, feedback from
others is more than welcome also.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Making tab-complete.c easier to maintain
Next
From: Robert Haas
Date:
Subject: Re: Using quicksort for every external sort run