Thread: [HACKERS] An issue in remote query optimization

[HACKERS] An issue in remote query optimization

From
Abbas Butt
Date:
Hi,
Postgres_fdw optimizes remote queries by pushing down the where clause.
This feature does not work consistently when the query is executed from within a pl/pgsql function. The optimization works when the function executes the query for the first 5 times, and fails afterwards.

Example:
Step 1:
Create the table on the foreign server and insert some rows in it
create table numbers(a int, b varchar(255));
insert into numbers values(1, 'One');
insert into numbers values(2, 'Two');
insert into numbers values(3, 'Three');
insert into numbers values(4, 'Four');
insert into numbers values(5, 'Five');
insert into numbers values(6, 'Six');
insert into numbers values(7, 'Seven');
insert into numbers values(8, 'Eight');
insert into numbers values(9, 'Nine');

Step 2:
Create the following objects on the local server
CREATE SERVER pg_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '127.0.0.1', port '5552', dbname 'test');
CREATE USER MAPPING FOR abbasbutt SERVER pg_server OPTIONS (user 'abbasbutt', password 'abc123');
CREATE FOREIGN TABLE foreign_numbers(a int, b varchar(255)) SERVER pg_server OPTIONS (table_name 'numbers');

create or replace function test_pg_fdw() returns void as $$
DECLARE
     n varchar;
BEGIN
         FOR x IN 1..9 LOOP
                 select b into n from foreign_numbers where a=x;
                 raise notice 'Found number %', n;
         end loop;
         return;
END
$$ LANGUAGE plpgsql;

Step 3:
Run the test:
select test_pg_fdw();

Step 4:
Check the output of auto_explain in the local server log

2017-01-31 00:39:25 PST LOG:  duration: 8.388 ms  plan:
    Query Text: select b        from foreign_numbers where a=x
    Foreign Scan on public.foreign_numbers  (cost=100.00..111.91 rows=1 width=516)
      Output: b
      Remote SQL: SELECT b FROM public.numbers WHERE ((a = 1))
2017-01-31 00:39:25 PST CONTEXT:  SQL statement "select b        from foreign_numbers where a=x"
    PL/pgSQL function test_pg_fdw() line 6 at SQL statement
2017-01-31 00:39:25 PST LOG:  duration: 0.315 ms  plan:
    Query Text: select b        from foreign_numbers where a=x
    Foreign Scan on public.foreign_numbers  (cost=100.00..111.91 rows=1 width=516)
      Output: b
      Remote SQL: SELECT b FROM public.numbers WHERE ((a = 2))
2017-01-31 00:39:25 PST CONTEXT:  SQL statement "select b        from foreign_numbers where a=x"
    PL/pgSQL function test_pg_fdw() line 6 at SQL statement
2017-01-31 00:39:25 PST LOG:  duration: 0.250 ms  plan:
    Query Text: select b        from foreign_numbers where a=x
    Foreign Scan on public.foreign_numbers  (cost=100.00..111.91 rows=1 width=516)
      Output: b
      Remote SQL: SELECT b FROM public.numbers WHERE ((a = 3))
2017-01-31 00:39:25 PST CONTEXT:  SQL statement "select b        from foreign_numbers where a=x"
    PL/pgSQL function test_pg_fdw() line 6 at SQL statement
2017-01-31 00:39:25 PST LOG:  duration: 0.257 ms  plan:
    Query Text: select b        from foreign_numbers where a=x
    Foreign Scan on public.foreign_numbers  (cost=100.00..111.91 rows=1 width=516)
      Output: b
      Remote SQL: SELECT b FROM public.numbers WHERE ((a = 4))
2017-01-31 00:39:25 PST CONTEXT:  SQL statement "select b        from foreign_numbers where a=x"
    PL/pgSQL function test_pg_fdw() line 6 at SQL statement
2017-01-31 00:39:25 PST LOG:  duration: 0.271 ms  plan:
    Query Text: select b        from foreign_numbers where a=x
    Foreign Scan on public.foreign_numbers  (cost=100.00..111.91 rows=1 width=516)
      Output: b
      Remote SQL: SELECT b FROM public.numbers WHERE ((a = 5))
2017-01-31 00:39:25 PST CONTEXT:  SQL statement "select b        from foreign_numbers where a=x"
    PL/pgSQL function test_pg_fdw() line 6 at SQL statement
2017-01-31 00:39:25 PST LOG:  duration: 0.251 ms  plan:
    Query Text: select b        from foreign_numbers where a=x
    Foreign Scan on public.foreign_numbers  (cost=100.00..114.91 rows=1 width=516)
      Output: b
      Filter: (foreign_numbers.a = $3)
      Remote SQL: SELECT a, b FROM public.numbers
2017-01-31 00:39:25 PST CONTEXT:  SQL statement "select b        from foreign_numbers where a=x"
    PL/pgSQL function test_pg_fdw() line 6 at SQL statement
2017-01-31 00:39:25 PST LOG:  duration: 0.246 ms  plan:
    Query Text: select b        from foreign_numbers where a=x
    Foreign Scan on public.foreign_numbers  (cost=100.00..114.91 rows=1 width=516)
      Output: b
      Filter: (foreign_numbers.a = $3)
      Remote SQL: SELECT a, b FROM public.numbers
2017-01-31 00:39:25 PST CONTEXT:  SQL statement "select b        from foreign_numbers where a=x"
    PL/pgSQL function test_pg_fdw() line 6 at SQL statement
2017-01-31 00:39:25 PST LOG:  duration: 0.226 ms  plan:
    Query Text: select b        from foreign_numbers where a=x
    Foreign Scan on public.foreign_numbers  (cost=100.00..114.91 rows=1 width=516)
      Output: b
      Filter: (foreign_numbers.a = $3)
      Remote SQL: SELECT a, b FROM public.numbers
2017-01-31 00:39:25 PST CONTEXT:  SQL statement "select b        from foreign_numbers where a=x"
    PL/pgSQL function test_pg_fdw() line 6 at SQL statement
2017-01-31 00:39:25 PST LOG:  duration: 0.223 ms  plan:
    Query Text: select b        from foreign_numbers where a=x
    Foreign Scan on public.foreign_numbers  (cost=100.00..114.91 rows=1 width=516)
      Output: b
      Filter: (foreign_numbers.a = $3)
      Remote SQL: SELECT a, b FROM public.numbers
2017-01-31 00:39:25 PST CONTEXT:  SQL statement "select b        from foreign_numbers where a=x"
    PL/pgSQL function test_pg_fdw() line 6 at SQL statement
2017-01-31 00:39:25 PST LOG:  duration: 13.963 ms  plan:
    Query Text: select test_pg_fdw();
    Result  (cost=0.00..0.26 rows=1 width=0)
      Output: test_pg_fdw()


Note that the remote query does not contain the WHERE clause after the 5th invocation.

I understand that this is because PostgreSQL starts using generic plan with pulled up where clause after the 5th invocation hoping that it would be faster since we have skiped planning the query on each invocation, but in this case this decision is causing the query to slow down.

How should we fix this problem?

--
Abbas
Architect
Skype ID: gabbasb
www.enterprisedb.com

Follow us on Twitter

@EnterpriseDB

Visit EnterpriseDB for tutorials, webinars, whitepapers and more

Re: [HACKERS] An issue in remote query optimization

From
Etsuro Fujita
Date:
On 2017/01/31 18:24, Abbas Butt wrote:
> Postgres_fdw optimizes remote queries by pushing down the where clause.
> This feature does not work consistently when the query is executed from
> within a pl/pgsql function. The optimization works when the function
> executes the query for the first 5 times, and fails afterwards.

> I understand that this is because PostgreSQL starts using generic plan
> with pulled up where clause after the 5th invocation hoping that it
> would be faster since we have skiped planning the query on each
> invocation, but in this case this decision is causing the query to slow
> down.
>
> How should we fix this problem?

ANALYZE for the foreign table doesn't work?

Best regards,
Etsuro Fujita





Re: [HACKERS] An issue in remote query optimization

From
Abbas Butt
Date:


On Tue, Jan 31, 2017 at 2:25 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
On 2017/01/31 18:24, Abbas Butt wrote:
Postgres_fdw optimizes remote queries by pushing down the where clause.
This feature does not work consistently when the query is executed from
within a pl/pgsql function. The optimization works when the function
executes the query for the first 5 times, and fails afterwards.

I understand that this is because PostgreSQL starts using generic plan
with pulled up where clause after the 5th invocation hoping that it
would be faster since we have skiped planning the query on each
invocation, but in this case this decision is causing the query to slow
down.

How should we fix this problem?

ANALYZE for the foreign table doesn't work?

No.

analyze ts.tickets;
WARNING:  skipping "tickets" --- cannot analyze this foreign table
ANALYZE

 

Best regards,
Etsuro Fujita





--
--
Abbas
Architect
Skype ID: gabbasb
www.enterprisedb.com

Follow us on Twitter

@EnterpriseDB

Visit EnterpriseDB for tutorials, webinars, whitepapers and more

Re: [HACKERS] An issue in remote query optimization

From
Etsuro Fujita
Date:
On 2017/01/31 19:53, Abbas Butt wrote:
> On Tue, Jan 31, 2017 at 2:25 AM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp <mailto:fujita.etsuro@lab.ntt.co.jp>> wrote:
>     On 2017/01/31 18:24, Abbas Butt wrote:

>         Postgres_fdw optimizes remote queries by pushing down the where
>         clause.
>         This feature does not work consistently when the query is
>         executed from
>         within a pl/pgsql function. The optimization works when the function
>         executes the query for the first 5 times, and fails afterwards.

>         I understand that this is because PostgreSQL starts using
>         generic plan
>         with pulled up where clause after the 5th invocation hoping that it
>         would be faster since we have skiped planning the query on each
>         invocation, but in this case this decision is causing the query
>         to slow
>         down.

>         How should we fix this problem?

>     ANALYZE for the foreign table doesn't work?

> No.
>
> analyze ts.tickets;
> WARNING:  skipping "tickets" --- cannot analyze this foreign table
> ANALYZE

How the foreign table ts.tickets is defined?

Best regards,
Etsuro Fujita





Re: [HACKERS] An issue in remote query optimization

From
Abbas Butt
Date:


On Tue, Jan 31, 2017 at 3:15 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
On 2017/01/31 19:53, Abbas Butt wrote:
On Tue, Jan 31, 2017 at 2:25 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp <mailto:fujita.etsuro@lab.ntt.co.jp>> wrote:
    On 2017/01/31 18:24, Abbas Butt wrote:

        Postgres_fdw optimizes remote queries by pushing down the where
        clause.
        This feature does not work consistently when the query is
        executed from
        within a pl/pgsql function. The optimization works when the function
        executes the query for the first 5 times, and fails afterwards.

        I understand that this is because PostgreSQL starts using
        generic plan
        with pulled up where clause after the 5th invocation hoping that it
        would be faster since we have skiped planning the query on each
        invocation, but in this case this decision is causing the query
        to slow
        down.

        How should we fix this problem?

    ANALYZE for the foreign table doesn't work?

No.

analyze ts.tickets;
WARNING:  skipping "tickets" --- cannot analyze this foreign table
ANALYZE

How the foreign table ts.tickets is defined?

test=# \d ts.tickets
         Foreign table "ts.tickets"
 Column |  Type   | Modifiers | FDW Options
--------+---------+-----------+-------------
 id     | integer | not null  |
Server: mysql_server
FDW Options: (dbname 'msql_test_db', table_name 'tickets')

Its a foreign table, referring to table 'tickets' defined on MySQL.


Best regards,
Etsuro Fujita





--
--
Abbas
Architect
Skype ID: gabbasb
www.enterprisedb.com

Follow us on Twitter

@EnterpriseDB

Visit EnterpriseDB for tutorials, webinars, whitepapers and more

Re: [HACKERS] An issue in remote query optimization

From
Ashutosh Bapat
Date:
On Tue, Jan 31, 2017 at 5:23 PM, Abbas Butt <abbas.butt@enterprisedb.com> wrote:
>
>
> On Tue, Jan 31, 2017 at 3:15 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>
> wrote:
>>
>> On 2017/01/31 19:53, Abbas Butt wrote:
>>>
>>> On Tue, Jan 31, 2017 at 2:25 AM, Etsuro Fujita
>>> <fujita.etsuro@lab.ntt.co.jp <mailto:fujita.etsuro@lab.ntt.co.jp>> wrote:
>>>     On 2017/01/31 18:24, Abbas Butt wrote:
>>
>>
>>>         Postgres_fdw optimizes remote queries by pushing down the where
>>>         clause.
>>>         This feature does not work consistently when the query is
>>>         executed from
>>>         within a pl/pgsql function. The optimization works when the
>>> function
>>>         executes the query for the first 5 times, and fails afterwards.
>>
>>
>>>         I understand that this is because PostgreSQL starts using
>>>         generic plan
>>>         with pulled up where clause after the 5th invocation hoping that
>>> it
>>>         would be faster since we have skiped planning the query on each
>>>         invocation, but in this case this decision is causing the query
>>>         to slow
>>>         down.
>>
>>
>>>         How should we fix this problem?
>>
>>
>>>     ANALYZE for the foreign table doesn't work?
>>
>>
>>> No.
>>>
>>> analyze ts.tickets;
>>> WARNING:  skipping "tickets" --- cannot analyze this foreign table
>>> ANALYZE
>>
>>
>> How the foreign table ts.tickets is defined?
>
>
> test=# \d ts.tickets
>          Foreign table "ts.tickets"
>  Column |  Type   | Modifiers | FDW Options
> --------+---------+-----------+-------------
>  id     | integer | not null  |
> Server: mysql_server
> FDW Options: (dbname 'msql_test_db', table_name 'tickets')
>
> Its a foreign table, referring to table 'tickets' defined on MySQL.
>
Isn't your original complaint about postgres_fdw? You can not tickets,
which is a mongo_fdw foreign table, is probably because mongo_fdw has
not implemented analyze FDW routine.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] An issue in remote query optimization

From
Abbas Butt
Date:
Sorry for the confusion.
ANALYZE works for the foreign table 'foreign_numbers'.
test=# analyze foreign_numbers;
ANALYZE
test=#



On Tue, Jan 31, 2017 at 5:04 AM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
On Tue, Jan 31, 2017 at 5:23 PM, Abbas Butt <abbas.butt@enterprisedb.com> wrote:
>
>
> On Tue, Jan 31, 2017 at 3:15 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>
> wrote:
>>
>> On 2017/01/31 19:53, Abbas Butt wrote:
>>>
>>> On Tue, Jan 31, 2017 at 2:25 AM, Etsuro Fujita
>>> <fujita.etsuro@lab.ntt.co.jp <mailto:fujita.etsuro@lab.ntt.co.jp>> wrote:
>>>     On 2017/01/31 18:24, Abbas Butt wrote:
>>
>>
>>>         Postgres_fdw optimizes remote queries by pushing down the where
>>>         clause.
>>>         This feature does not work consistently when the query is
>>>         executed from
>>>         within a pl/pgsql function. The optimization works when the
>>> function
>>>         executes the query for the first 5 times, and fails afterwards.
>>
>>
>>>         I understand that this is because PostgreSQL starts using
>>>         generic plan
>>>         with pulled up where clause after the 5th invocation hoping that
>>> it
>>>         would be faster since we have skiped planning the query on each
>>>         invocation, but in this case this decision is causing the query
>>>         to slow
>>>         down.
>>
>>
>>>         How should we fix this problem?
>>
>>
>>>     ANALYZE for the foreign table doesn't work?
>>
>>
>>> No.
>>>
>>> analyze ts.tickets;
>>> WARNING:  skipping "tickets" --- cannot analyze this foreign table
>>> ANALYZE
>>
>>
>> How the foreign table ts.tickets is defined?
>
>
> test=# \d ts.tickets
>          Foreign table "ts.tickets"
>  Column |  Type   | Modifiers | FDW Options
> --------+---------+-----------+-------------
>  id     | integer | not null  |
> Server: mysql_server
> FDW Options: (dbname 'msql_test_db', table_name 'tickets')
>
> Its a foreign table, referring to table 'tickets' defined on MySQL.
>
Isn't your original complaint about postgres_fdw? You can not tickets,
which is a mongo_fdw foreign table, is probably because mongo_fdw has
not implemented analyze FDW routine.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



--
--
Abbas
Architect
Skype ID: gabbasb
www.enterprisedb.com

Follow us on Twitter

@EnterpriseDB

Visit EnterpriseDB for tutorials, webinars, whitepapers and more

Re: [HACKERS] An issue in remote query optimization

From
Tom Lane
Date:
Abbas Butt <abbas.butt@enterprisedb.com> writes:
> Postgres_fdw optimizes remote queries by pushing down the where clause.
> This feature does not work consistently when the query is executed from
> within a pl/pgsql function. The optimization works when the function
> executes the query for the first 5 times, and fails afterwards.

Well, it's switching to a generic plan.  Your first five executions look
like (ignoring the startup transient):

> 2017-01-31 00:39:25 PST LOG:  duration: 0.315 ms  plan:
>     Query Text: select b        from foreign_numbers where a=x
>     Foreign Scan on public.foreign_numbers  (cost=100.00..111.91 rows=1
> width=516)
>       Output: b
>       Remote SQL: SELECT b FROM public.numbers WHERE ((a = 2))

with actual durations ranging from 0.250 to 0.315 ms.  After that
you get generic plans:

> 2017-01-31 00:39:25 PST LOG:  duration: 0.251 ms  plan:
>     Query Text: select b        from foreign_numbers where a=x
>     Foreign Scan on public.foreign_numbers  (cost=100.00..114.91 rows=1
> width=516)
>       Output: b
>       Filter: (foreign_numbers.a = $3)
>       Remote SQL: SELECT a, b FROM public.numbers

with actual durations ranging from 0.223 to 0.251 ms.

So where's the slowdown, exactly?  It looks to me like the planner
has concluded that a custom plan is not enough better than generic
to justify repeated planning cost, and it looks to me like it was right.

> Note that the remote query does not contain the WHERE clause after the 5th
> invocation.

You'd have to take that up with the author of the FDW you're using.
Optimization of queries involving foreign tables is almost completely
on the head of the FDW, and this one seems not to know about pushing
down WHERE clauses that involve Param nodes.
        regards, tom lane