Re: Getting sorted data from foreign server - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Getting sorted data from foreign server
Date
Msg-id CA+TgmoYbO2TZ3JQVdrsLUXjc1YWeZEGETw3Q1cvn4GnbY81Ymw@mail.gmail.com
Whole thread Raw
In response to Re: Getting sorted data from foreign server  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Responses Re: Getting sorted data from foreign server  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
List pgsql-hackers
On Thu, Oct 15, 2015 at 6:28 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> Attached is the patch which takes care of above comments.

I spent some time on this patch today.  But it's still not right.

I've attached a new version which fixes a serious problem with your
last version - having postgresGetForeignPaths do the costing of the
sorted path itself instead of delegating that to
estimate_path_cost_size is wrong.  In your version, 10% increment gets
applied to the network transmission costs as well as the cost of
generating the tupes - but only when use_remote_estimate == false.  I
fixed this and did some cosmetic cleanup.

But you'll notice if you try this some of postgres_fdw's regression
tests fail.  This is rather mysterious:

***************
*** 697,715 ****
   Sort
     Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
     Sort Key: t1.c1
!    ->  Nested Loop Semi Join
           Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
!          Join Filter: (t1.c3 = t2.c3)
           ->  Foreign Scan on public.ft1 t1
                 Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
                 Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8
FROM "S 1"."T 1" WHERE (("C 1" < 20))
!          ->  Materialize
                 Output: t2.c3
!                ->  Foreign Scan on public.ft2 t2
                       Output: t2.c3
!                      Filter: (date(t2.c4) = '01-17-1970'::date)
!                      Remote SQL: SELECT c3, c4 FROM "S 1"."T 1"
WHERE (("C 1" > 10))
! (15 rows)

  EXECUTE st2(10, 20);
   c1 | c2 |  c3   |              c4              |            c5
      | c6 |     c7     | c8
--- 697,718 ----
   Sort
     Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
     Sort Key: t1.c1
!    ->  Hash Join
           Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
!          Hash Cond: (t1.c3 = t2.c3)
           ->  Foreign Scan on public.ft1 t1
                 Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
                 Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8
FROM "S 1"."T 1" WHERE (("C 1" < 20))
!          ->  Hash
                 Output: t2.c3
!                ->  HashAggregate
                       Output: t2.c3
!                      Group Key: t2.c3
!                      ->  Foreign Scan on public.ft2 t2
!                            Output: t2.c3
!                            Filter: (date(t2.c4) = '01-17-1970'::date)
!                            Remote SQL: SELECT c3, c4 FROM "S 1"."T
1" WHERE (("C 1" > 10))
! (18 rows)

What I think is happening here is that the planner notices that
instead of doing a parameterized nestloop, it could pull down the data
already sorted from the remote side, cheaply unique-ify it by using
the ordering provided by the remote side, and then do a standard hash
join.  That might well be a sensible approach, but the ORDER BY that
would make it correct doesn't show up in the Remote SQL.  I don't know
why that's happening, but it's not good.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-hackers by date:

Previous
From: "Jinyu Zhang"
Date:
Subject: Re: Patch: Optimize memory allocation in function 'bringetbitmap'
Next
From: Andrew Dunstan
Date:
Subject: Re: buildfarm failures on crake and sittella