Thread: Postgres_FDW optimizations

Postgres_FDW optimizations

From

cevian

Date:

02 December 2015, 19:25:20

Hi all,

I have a question about postgres_fdw optimizations/pushdown:

I have the following code running on 9.5beta2 (same format as
previous/related message for consistency)
CREATE EXTENSION postgres_fdw; 
CREATE SERVER loop foreign data wrapper postgres_fdw  OPTIONS (port '5432', dbname 'testdb'); 
CREATE USER MAPPING FOR PUBLIC SERVER loop; 

create table onemillion (    id serial primary key,    inserted timestamp default clock_timestamp(),    data text 
); 

insert into onemillion(data) select random() from 
generate_series(1,1000000); 

CREATE FOREIGN TABLE onemillion_pgfdw (    id int,    inserted timestamp,    data text 
) SERVER loop 
OPTIONS (table_name 'onemillion',         use_remote_estimate 'true'); 

explain verbose select * from onemillion_pgfdw order by id limit 1;                                            QUERY
PLAN
----------------------------------------------------------------------------------------------------Limit
(cost=43434.00..43434.00rows=1 width=30)  Output: id, inserted, data  ->  Sort  (cost=43434.00..45934.00 rows=1000000
width=30)       Output: id, inserted, data        Sort Key: onemillion_pgfdw.id        ->  Foreign Scan on
public.onemillion_pgfdw (cost=100.00..38434.00
 
rows=1000000 width=30)              Output: id, inserted, data              Remote SQL: SELECT id, inserted, data FROM
public.onemillion

This is obviously highly inefficient. The sort and limit should be pushed
down to the foreign node, especially on such a simple query. I have 3
questions:

1) Is this the expected stated of the fdw optimizations for now, or is it a
bug?
2) Is anybody working on this type of pushdown right now (I would be more
than willing to collaborate on a patch)
3) Is this possible to fix with with views/rules/triggers/different query. I
couldn't find a way. Relatedly, is there a way to explicitly specify an
explicit remote query to run through the fdw? 

Thanks,
Matvey Arye
Iobeam, Inc.




--
View this message in context: http://postgresql.nabble.com/Postgres-FDW-optimizations-tp5875911.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: Postgres_FDW optimizations

From

Julien Rouhaud

Date:

03 December 2015, 00:09:43

On 02/12/2015 20:25, cevian wrote:
> Hi all,
> 

Hello,

> I have a question about postgres_fdw optimizations/pushdown:
> 
> I have the following code running on 9.5beta2 (same format as
> previous/related message for consistency)
> CREATE EXTENSION postgres_fdw; 
> CREATE SERVER loop foreign data wrapper postgres_fdw 
>   OPTIONS (port '5432', dbname 'testdb'); 
> CREATE USER MAPPING FOR PUBLIC SERVER loop; 
> 
> create table onemillion ( 
>     id serial primary key, 
>     inserted timestamp default clock_timestamp(), 
>     data text 
> ); 
> 
> insert into onemillion(data) select random() from 
> generate_series(1,1000000); 
> 
> CREATE FOREIGN TABLE onemillion_pgfdw ( 
>     id int, 
>     inserted timestamp, 
>     data text 
> ) SERVER loop 
> OPTIONS (table_name 'onemillion', 
>          use_remote_estimate 'true'); 
> 
> explain verbose select * from onemillion_pgfdw order by id limit 1;
>                                              QUERY PLAN
> ----------------------------------------------------------------------------------------------------
>  Limit  (cost=43434.00..43434.00 rows=1 width=30)
>    Output: id, inserted, data
>    ->  Sort  (cost=43434.00..45934.00 rows=1000000 width=30)
>          Output: id, inserted, data
>          Sort Key: onemillion_pgfdw.id
>          ->  Foreign Scan on public.onemillion_pgfdw  (cost=100.00..38434.00
> rows=1000000 width=30)
>                Output: id, inserted, data
>                Remote SQL: SELECT id, inserted, data FROM public.onemillion
> 
> This is obviously highly inefficient. The sort and limit should be pushed
> down to the foreign node, especially on such a simple query. I have 3
> questions:
> 
> 1) Is this the expected stated of the fdw optimizations for now, or is it a
> bug?
> 2) Is anybody working on this type of pushdown right now (I would be more
> than willing to collaborate on a patch)

The sort pushdown for postgres_fdw has been committed a few weeks ago
for 9.6, see
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f18c944b6137329ac4a6b2dce5745c5dc21a8578

> 3) Is this possible to fix with with views/rules/triggers/different query. I
> couldn't find a way. Relatedly, is there a way to explicitly specify an
> explicit remote query to run through the fdw? 
> 

For now, I don't see any other solution than executing a remote query
with the dblink extension:
http://www.postgresql.org/docs/current/static/contrib-dblink-function.html

Regards.

> Thanks,
> Matvey Arye
> Iobeam, Inc.
> 
> 
> 
> 
> --
> View this message in context: http://postgresql.nabble.com/Postgres-FDW-optimizations-tp5875911.html
> Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
> 
> 


-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Re: Postgres_FDW optimizations

From

Ashutosh Bapat

Date:

03 December 2015, 06:16:46

On Thu, Dec 3, 2015 at 12:55 AM, cevian <cevian@gmail.com> wrote:

Hi all,

I have a question about postgres_fdw optimizations/pushdown:

I have the following code running on 9.5beta2 (same format as
previous/related message for consistency)
CREATE EXTENSION postgres_fdw;
CREATE SERVER loop foreign data wrapper postgres_fdw
OPTIONS (port '5432', dbname 'testdb');
CREATE USER MAPPING FOR PUBLIC SERVER loop;

create table onemillion (
id serial primary key,
inserted timestamp default clock_timestamp(),
data text
);

insert into onemillion(data) select random() from
generate_series(1,1000000);

CREATE FOREIGN TABLE onemillion_pgfdw (
id int,
inserted timestamp,
data text
) SERVER loop
OPTIONS (table_name 'onemillion',
use_remote_estimate 'true');

explain verbose select * from onemillion_pgfdw order by id limit 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------
Limit (cost=43434.00..43434.00 rows=1 width=30)
Output: id, inserted, data
-> Sort (cost=43434.00..45934.00 rows=1000000 width=30)
Output: id, inserted, data
Sort Key: onemillion_pgfdw.id
-> Foreign Scan on public.onemillion_pgfdw (cost=100.00..38434.00
rows=1000000 width=30)
Output: id, inserted, data
Remote SQL: SELECT id, inserted, data FROM public.onemillion

This is obviously highly inefficient. The sort and limit should be pushed
down to the foreign node, especially on such a simple query. I have 3
questions:

The patch for sort pushdown was committed few weeks ago and will be available in 9.6. Let me know what you see when you execute the query on current developement branch. Limit support will be added eventually, but the timeline is not clear yet.

1) Is this the expected stated of the fdw optimizations for now, or is it a
bug?

For 9.2 this is expected behaviour and not a bug.

2) Is anybody working on this type of pushdown right now (I would be more
than willing to collaborate on a patch)

There are few people working on this. You will see Hanada-san, Horiguchi-san, Fujita-san, myself and Robert working on it mostly. But there are other contributors too, so forgive me if I have missed any. You are welcome to join hands. Right now we are concentrating on join pushdown, DML pushdown and asynchronous query execution.

3) Is this possible to fix with with views/rules/triggers/different query. I
couldn't find a way. Relatedly, is there a way to explicitly specify an
explicit remote query to run through the fdw?

dblink module can be of help here.

Thanks,
Matvey Arye
Iobeam, Inc.

--
View this message in context: http://postgresql.nabble.com/Postgres-FDW-optimizations-tp5875911.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company