Thread: FDW, too long to run explain

FDW, too long to run explain

From
Vijaykumar Jain
Date:
Hi,

with pg v10.1

I have a setup enabled as below.
7 shards ( 1RW,  2 RO )
they all are fronted by FDW talking to each other.

we use writes directly to shards, and reads via FDW from all shards (RO)
our DB size is ~ 500GB each shard, and tables are huge too.
1 table ~ 200GB, 1 ~55GB, 1 ~40GB and a lot of small tables, but large indices on large table.

the sharding was done based on a key to enable shard isolation at app layer using a fact table.
select id,shard from fact_table;

server resources are,
32GB mem, 8 vcpu, 500GB SSD.

the FDW connect to each other shard via FDW fronted by haproxy -> pgbouncer -> postgresql.
Hope this is good enough background :)

now we have some long running queries via FDW that take minutes and get killed explain runs as idle in transaction on remote servers. (we set use_remote_estimate = true )
when the query is run on individual shards directly, it runs pretty quickly, 
but when run via FDW, it takes very long.
i even altered fetch_sie to 10000, so that in case some filters do not get pushed, those can be applied on the FDW quickly.

but i am lost at the understanding of why explain runs for ever via FDW.
we have a view on remote servers. we import public schema from remote servers, into coordinator custom schema, and then union all

select * from (
select * from sh01.view1
union all
select * from sh01.view1
...
) t where t.foo = 'bar' limit 10;


now the explain for 
select * from sh01.view1  keeps running for minutes sometimes,

then fetch too keeps running for minutes, although the total rows are < 10000 maybe.
idle in transaction | FETCH 10000 FROM c1

we have very aggressive  settings for autovacuum and auto analyze.

autovacuum_naptime = '15s'
autovacuum_vacuum_scale_factor = '0.001'
autovacuum_analyze_scale_factor = '0.005'
log_autovacuum_min_duration = '0'
maintenance_work_mem = '2GB'
autovacuum_vacuum_cost_limit = '5000'
autovacuum_vacuum_cost_delay = '5ms'


other questions:
also, what is the cost of fetch_size?
we have in our settings => use_remote_estimate=true,fetch_size=10000

I mean given we have a query

select * from foobar limit 10000; via FDW
limit 10000 does not get pushed.
so it seems all rows some to FDW node and then limit is applied?


i currently do not have the queries, but i have a screenshot for long running explain via FDW.
also since the whole query does not show up in pg_stat_statement, i am not sure, that would be of great help since predicate although applied, do not show up in pg_stat_activity.


I know, there can be more info i can provide, but  if anyone has experienced this, pls let me know.

BTW, i know citus is an option, but can we keep that option aside.

we see better ways to handle this in future, by sharding on ids and further partitioning of tables and parallel execution of FDW queries, but we need to know if this is a known issue of pg10 or i am doing something wrong which will bite in pg11 too.


Appreciate your help, always.


Regards,
Vijay

Re: FDW, too long to run explain

From
auxsvr
Date:
On Monday, 4 February 2019 09:14:14 EET Vijaykumar Jain wrote:
> Hi,

Hi,

> with pg v10.1

> we use writes directly to shards, and reads via FDW from all shards (RO)
> our DB size is ~ 500GB each shard, and tables are huge too.
> 1 table ~ 200GB, 1 ~55GB, 1 ~40GB and a lot of small tables, but large
> indices on large table.
> 
> the sharding was done based on a key to enable shard isolation at app layer
> using a fact table.
> select id,shard from fact_table;
> 
> server resources are,
> 32GB mem, 8 vcpu, 500GB SSD.
> 
> the FDW connect to each other shard via FDW fronted by haproxy -> pgbouncer
> -> postgresql.
> Hope this is good enough background :)
> 
> now we have some long running queries via FDW that take minutes and get
> killed explain runs as idle in transaction on remote servers. (we set
> use_remote_estimate = true )
> when the query is run on individual shards directly, it runs pretty
> quickly,
> but when run via FDW, it takes very long.
> i even altered fetch_sie to 10000, so that in case some filters do not get
> pushed, those can be applied on the FDW quickly.

In general, the plans via FDW are not the same as the ones running locally. We're having similar issues and the reason
seemsto be that queries via FDW are optimized for startup cost or few rows.
 

> Regards,
> Vijay

-- 
Regards,
Peter




Re: FDW, too long to run explain

From
auxsvr
Date:
Related to this question:

Postgresql cursors are in most cases I've tried extremely slow. The cause is as described in my previous answer, in my
experience.Is there any plan or way to improve this situation? For example, for FDW one would expect the plan on the
remoteside to be similar, if not identical, to the one locally, with the exception of the setup cost.
 
-- 
Regards,
Peter




Re: [External] Re: FDW, too long to run explain

From
Vijaykumar Jain
Date:
I am yet to figure out the reason, what we have done is implement fake columns to represent samples and giving them random numbers and keeping other bulls to fake limit.

Most of the queries that were impacted were the ones that did not push order by and limit to foreign servers.
I am also trying to upgrade pg11 to make use of parallelisation.
For now I am making use of materialised view on each shard and using predicates that get pushed directly to ensure a simple plan is created.
There is a compromise but this is what is reasonable for now.

On Sun, 17 Feb 2019 at 4:27 PM auxsvr <auxsvr@gmail.com> wrote:
Related to this question:

Postgresql cursors are in most cases I've tried extremely slow. The cause is as described in my previous answer, in my experience. Is there any plan or way to improve this situation? For example, for FDW one would expect the plan on the remote side to be similar, if not identical, to the one locally, with the exception of the setup cost.
--
Regards,
Peter



--

Regards,
Vijay

Re: FDW, too long to run explain

From
Jeff Janes
Date:
On Mon, Feb 4, 2019 at 2:15 AM Vijaykumar Jain <vjain@opentable.com> wrote:

now we have some long running queries via FDW that take minutes and get killed explain runs as idle in transaction on remote servers.

Are you saying the EXPLAIN itself gets killed, or execution of the plan generated based on the EXPLAIN (issued under use_remote_estimate = true) gets killed?  Who is doing the killing, the local side or the foreign side?  Can you include verbatim log entries for this?
 
now the explain for 
select * from sh01.view1  keeps running for minutes sometimes,

then fetch too keeps running for minutes, although the total rows are < 10000 maybe.
idle in transaction | FETCH 10000 FROM c1

What is this?  Is it from some monitoring tool, or pg_stat_activity, or what?  And is it on the local side or the foreign side?
 
other questions:
also, what is the cost of fetch_size?

It will always fetch rows from the foreign server in this sized chunks.  A larger fetch_size will have less network latency and computational overhead if many rows are going to be consumed, but also consume more memory on the local server as all rows are stored in memory per each chunk.  Also, in the case of a LIMIT, it reads a large number of rows even if most of them may be unneeded.  Conceptually, the LIMIT could be used to modify the FETCH downward to match the LIMIT, but that is not implemented.  In the case of a view over UNION ALL, I don't think the individual subqueries even know what the global LIMIT is.
 

I mean given we have a query

select * from foobar limit 10000; via FDW
limit 10000 does not get pushed.
so it seems all rows some to FDW node and then limit is applied?

It should not read all rows.  It should read as many multiples of fetch_size as needed, which should just be 1 multiple in this case.

Cheers,

Jeff

Re: [External] Re: FDW, too long to run explain

From
Vijaykumar Jain
Date:
Assuming your questions as 1,2,3, please find my answers below.

1)"explain" on foreign servers run as "idle in transactions". coz they were running very long (in the order of some minutes) , pgbouncer (in tx level pooling) setting kill them (as idle in tx time limit exceeded of 5 mins) or else results in too many connections piling up.
2)yes, i get those from pg_stat_activity, it truncates the full statement, but  it shows up as * EXPLAIN select col1, col2 .... *  00:00:44 | idle in transaction (this is just one of the screenshots i have). (on the foreign side)
3)yes, i think we kind of understood that part (fetch and memory), but i am not sure if that is used as any hint in plan generation too. i am sorry, i did not put auto explain on, on foreign servers, as that required a restart of the server. 

(this is the real content of the screenshot ,yes 13 mins), masking the colname and viewname
20678 | 00:13:38.990025 | EXPLAIN SELECT cols from view | idle in transaction

 the explain analyze of the same query on the foreign server is in ms.


I am sorry, i am vague about the queries in the email. i cannot reproduce it, as we do not have multiple shards of 500G in my qa environment and i cannot take dump of prod to test that in our test env coz of gdpr :)
but as i said in the mail, we were speculating since limit was not passed, the plans may have been bad. We tricked the foreign server by using a sample column to fake limit push down, and now have improved response times. We made vaccum/analyze very aggressive to ensure stats are never stale after large updates or deletes.

Unless someone can else reproduce, I guess, i'll close this mail. (I'll try to reproduce it myself again, but for now i have less data to share to convince anyone that happened.




Regards,
Vijay


On Sun, Feb 17, 2019 at 11:11 PM Jeff Janes <jeff.janes@gmail.com> wrote:
On Mon, Feb 4, 2019 at 2:15 AM Vijaykumar Jain <vjain@opentable.com> wrote:

now we have some long running queries via FDW that take minutes and get killed explain runs as idle in transaction on remote servers.

Are you saying the EXPLAIN itself gets killed, or execution of the plan generated based on the EXPLAIN (issued under use_remote_estimate = true) gets killed?  Who is doing the killing, the local side or the foreign side?  Can you include verbatim log entries for this?

explain on foreign servers run as "idle in transactions". coz they were running very long (in the order of some minutes) , pgbouncer setting kill them (as idle in tx time limit exceeded of 5 mins) or else results in too many connections piling up.

 
now the explain for 
select * from sh01.view1  keeps running for minutes sometimes,

then fetch too keeps running for minutes, although the total rows are < 10000 maybe.
idle in transaction | FETCH 10000 FROM c1

What is this?  Is it from some monitoring tool, or pg_stat_activity, or what?  And is it on the local side or the foreign side?
yes, pg_stat_activity, it truncates the full statement, but  it shows up as * EXPLAIN select col1, col2 .... *  00:00:44 | idle in transaction (this is just one of the screenshots i have). (on the foreign side)
 
 
other questions:
also, what is the cost of fetch_size?

It will always fetch rows from the foreign server in this sized chunks.  A larger fetch_size will have less network latency and computational overhead if many rows are going to be consumed, but also consume more memory on the local server as all rows are stored in memory per each chunk.  Also, in the case of a LIMIT, it reads a large number of rows even if most of them may be unneeded.  Conceptually, the LIMIT could be used to modify the FETCH downward to match the LIMIT, but that is not implemented.  In the case of a view over UNION ALL, I don't think the individual subqueries even know what the global LIMIT is.
Yep, i guess that is where i think the plan may have  
 

I mean given we have a query

select * from foobar limit 10000; via FDW
limit 10000 does not get pushed.
so it seems all rows some to FDW node and then limit is applied?

It should not read all rows.  It should read as many multiples of fetch_size as needed, which should just be 1 multiple in this case.
Yep, i think we kind of understood that part, but i am not sure if that is used to generate the plan too. i am sorry, i did not put auto explain on, on foreign servers, as that required a restart of the server. 

Cheers,

Jeff

Re: FDW, too long to run explain

From
Jeff Janes
Date:
On Sun, Feb 17, 2019 at 12:41 PM Jeff Janes <jeff.janes@gmail.com> wrote:
On Mon, Feb 4, 2019 at 2:15 AM Vijaykumar Jain <vjain@opentable.com> wrote:

now we have some long running queries via FDW that take minutes and get killed explain runs as idle in transaction on remote servers.

Are you saying the EXPLAIN itself gets killed, or execution of the plan generated based on the EXPLAIN (issued under use_remote_estimate = true) gets killed?  Who is doing the killing, the local side or the foreign side?  Can you include verbatim log entries for this?

After thinking about it a bit more, I think I see the issue here.  The EXPLAIN pursuant to use_remote_estimate is issued in the same remote transaction as the following DECLARE and FETCH's are.  But after the EXPLAIN is issued, the local server executes the query for a different FDW to satisfy some other branch of the UNION ALL, giving the first FDW connection time to do an idle-in-transaction timeout.  This happens even if no rows need to fetched from that FDW, because another branch of the UNION ALL satisfied the LIMIT.

A question for the PostgreSQL hackers would be, Is it necessary and desirable that the EXPLAIN be issued in the same transaction as the eventual DECLARE and FETCHes?  I don't think it is.  I guess if the foreign side table definition got changed between EXPLAIN and DECLARE it would cause problems, but changing the foreign side definition out of sync with the local side can cause problems anyway, so is that important to preserve?

Changing that might narrow but not completely fix the problem, as there might still be delays between the DECLARE and the FETCH or between successive FETCHes. 

So a question for you would be, why do have such an aggressive setting for idle_in_transaction_session_timeout that it causes this to happen?  Couldn't you relax it, perhaps just for the role used for the FDW connections?

Cheers,

Jeff

Re: FDW, too long to run explain

From
Tom Lane
Date:
Jeff Janes <jeff.janes@gmail.com> writes:
> A question for the PostgreSQL hackers would be, Is it necessary and
> desirable that the EXPLAIN be issued in the same transaction as the
> eventual DECLARE and FETCHes?  I don't think it is.

It seems like a good idea to me.  I certainly don't think "I've got
an idle-in-transaction timeout on the remote that's shorter than my
local transaction runtime" is a plausible argument for changing that.
You could trip over that with a slow query regardless of whether we
separated the EXPLAIN step, just because there's no guarantee how
often we'll ask the FDW to fetch some rows.

> I guess if the foreign
> side table definition got changed between EXPLAIN and DECLARE it would
> cause problems, but changing the foreign side definition out of sync with
> the local side can cause problems anyway, so is that important to preserve?

I believe that the EXPLAIN will leave the remote transaction holding
AccessShareLock on the query's tables, meaning that doing it in one
transaction provides some positive protection against such problems,
which we'd lose if we changed this.

            regards, tom lane


Re: [External] Re: FDW, too long to run explain

From
Jeff Janes
Date:
On Sun, Feb 17, 2019 at 1:52 PM Vijaykumar Jain <vjain@opentable.com> wrote:
Assuming your questions as 1,2,3, please find my answers below.

1)"explain" on foreign servers run as "idle in transactions". coz they were running very long (in the order of some minutes) , pgbouncer (in tx level pooling) setting kill them (as idle in tx time limit exceeded of 5 mins) or else results in too many connections piling up.
2)yes, i get those from pg_stat_activity, it truncates the full statement, but  it shows up as * EXPLAIN select col1, col2 .... *  00:00:44 | idle in transaction (this is just one of the screenshots i have). (on the foreign side)

You are misinterpreting that data.  The EXPLAIN is not currently running.  It is the last statement that was running prior to the connection going idle-in-transaction.  See my just previous email--I think the reason it is idle is that the local is servicing some other part of the query (probably on a different FDW), and that is taking a long time.

Are all the connections piling up from postgres_fdw, or are many of them from other applications?  I think your timeout is just shifting symptoms around without fixing the underlying problem, while also making that underlying problem hard to diagnose.

 
3)yes, i think we kind of understood that part (fetch and memory), but i am not sure if that is used as any hint in plan generation too.

The query is planned as part of a cursor.  As such, it will use cursor_tuple_fraction as the "hint".  Perhaps you could tweak this parameter on the foreign side.  I think that a low setting for this parameter should give similar plans as a small LIMIT would give you, while large settings would give the same plans as a large (or no) LIMIT would.

I think postgres_fdw should pass does the LIMIT when it can do so, but it doesn't currently.

Cheers,

Jeff

Re: [External] Re: FDW, too long to run explain

From
Vijaykumar Jain
Date:
Hey Jeff,

yes, we now relaxed the idle in transaction setting to 15 mins.

i was hesitant to increase the settings as it blocked auto vaccum. We use hot_standby_feedback = true also as we split reads/writes and allow long running queries on read replicas, this too affects auto vaccum.
so overall, all the options i set to ensure auto vaccum gets triggered get impacted by increased idle in tx and hot_standby_feedback = true, both of which seem to be necessary for the setup now.
 
we have been trying to work with sharding using (mutli coordinator FDW) on our own (and have been successful although have hiccups), using directory based sharding in pg10. (if we cannot handle growth, all goes to mongo for its automatic sharding and failover)

I have to admit we can do better here though. we need to rebalance the data in the shards when we come close to 90% disk. those are long delete/upsert queries. We have very aggressive autovaccum to ensure we do not have a lot of stale stats.
I have plans to rearchitect the whole setup with pg11 where we plan to introduce time based sharding and then table partitioning in each shard further by time and also use Materialized views, for day old data with pre aggregated fields on each shard so that explain does not have to work too hard :)

and then create foreign tables and attach them as partitions. similar to https://github.com/MasahikoSawada/pgconf-asia-demo/tree/c47e25bf589c7d401c9d342329b400ec26eb61db

i guess, i am diverting the query, but just saying :) 
Thanks for suggestions and help Jeff. Appreciate it.

Regards,
Vijay


On Mon, Feb 18, 2019 at 12:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jeff Janes <jeff.janes@gmail.com> writes:
> A question for the PostgreSQL hackers would be, Is it necessary and
> desirable that the EXPLAIN be issued in the same transaction as the
> eventual DECLARE and FETCHes?  I don't think it is.

It seems like a good idea to me.  I certainly don't think "I've got
an idle-in-transaction timeout on the remote that's shorter than my
local transaction runtime" is a plausible argument for changing that.
You could trip over that with a slow query regardless of whether we
separated the EXPLAIN step, just because there's no guarantee how
often we'll ask the FDW to fetch some rows.

> I guess if the foreign
> side table definition got changed between EXPLAIN and DECLARE it would
> cause problems, but changing the foreign side definition out of sync with
> the local side can cause problems anyway, so is that important to preserve?

I believe that the EXPLAIN will leave the remote transaction holding
AccessShareLock on the query's tables, meaning that doing it in one
transaction provides some positive protection against such problems,
which we'd lose if we changed this.

                        regards, tom lane

Re: [External] Re: FDW, too long to run explain

From
Vijaykumar Jain
Date:


Regards,
Vijay


On Mon, Feb 18, 2019 at 12:56 AM Jeff Janes <jeff.janes@gmail.com> wrote:
On Sun, Feb 17, 2019 at 1:52 PM Vijaykumar Jain <vjain@opentable.com> wrote:
Assuming your questions as 1,2,3, please find my answers below.

1)"explain" on foreign servers run as "idle in transactions". coz they were running very long (in the order of some minutes) , pgbouncer (in tx level pooling) setting kill them (as idle in tx time limit exceeded of 5 mins) or else results in too many connections piling up.
2)yes, i get those from pg_stat_activity, it truncates the full statement, but  it shows up as * EXPLAIN select col1, col2 .... *  00:00:44 | idle in transaction (this is just one of the screenshots i have). (on the foreign side)

You are misinterpreting that data.  The EXPLAIN is not currently running.  It is the last statement that was running prior to the connection going idle-in-transaction.  See my just previous email--I think the reason it is idle is that the local is servicing some other part of the query (probably on a different FDW), and that is taking a long time.
Ok, i raked this from the logs where enabled log_min_duration_statement = 10s

2019-01-31 12:48:18 UTC LOG:  duration: 29863.311 ms  statement: EXPLAIN SELECT blah, FROM public.view WHERE ((scheduled_bdt >= '2019-01-20'::date)) AND ((scheduled_bdt <= '2019-01-26'::date)) AND ((somekey = ANY ('{269029,123399,263164,261487}'::bigint[])))   (both the columns are indexed)

Are all the connections piling up from postgres_fdw, or are many of them from other applications?  I think your timeout is just shifting symptoms around without fixing the underlying problem, while also making that underlying problem hard to diagnose.
same application, but when more than one person is using the analytical tool that runs the underlying query. 

 
3)yes, i think we kind of understood that part (fetch and memory), but i am not sure if that is used as any hint in plan generation too.

The query is planned as part of a cursor.  As such, it will use cursor_tuple_fraction as the "hint".  Perhaps you could tweak this parameter on the foreign side.  I think that a low setting for this parameter should give similar plans as a small LIMIT would give you, while large settings would give the same plans as a large (or no) LIMIT would.

I think postgres_fdw should pass does the LIMIT when it can do so, but it doesn't currently.
As i already said, we have overcome the limit issue with a fake sample column in the huge tables. that way we limit the number of rows on the foreign server itself before the fetch. this is not the best and has its edge cases, but yeah, it works for now.
 

Cheers,

Jeff

Re: FDW, too long to run explain

From
auxsvr
Date:
On Sunday, 17 February 2019 20:58:47 EET Jeff Janes wrote:
> A question for the PostgreSQL hackers would be, Is it necessary and
> desirable that the EXPLAIN be issued in the same transaction as the
> eventual DECLARE and FETCHes?  I don't think it is.  I guess if the foreign
> side table definition got changed between EXPLAIN and DECLARE it would
> cause problems, but changing the foreign side definition out of sync with
> the local side can cause problems anyway, so is that important to preserve?

Won't separate transactions cause issues if the statistics of the table change in the meantime in a way that affects
theplan?
 

> Cheers,
> 
> Jeff
-- 
Regards,
Peter




Re: [External] Re: FDW, too long to run explain

From
Jeff Janes
Date:


On Sun, Feb 17, 2019 at 2:37 PM Vijaykumar Jain <vjain@opentable.com> wrote:

Ok, i raked this from the logs where enabled log_min_duration_statement = 10s

2019-01-31 12:48:18 UTC LOG:  duration: 29863.311 ms  statement: EXPLAIN SELECT blah, FROM public.view WHERE ((scheduled_bdt >= '2019-01-20'::date)) AND ((scheduled_bdt <= '2019-01-26'::date)) AND ((somekey = ANY ('{269029,123399,263164,261487}'::bigint[])))   (both the columns are indexed)

That is interesting.  Was that in the logs for the local or the foreign side?  And is it common, or rare?

If on the local side, could it be that the EXPLAINs sent to the foreign side are being made to wait by the connection pooler, leading to long delays?  If that is from the foreign side, then it should be conceptually unrelated to FDW.  Any chance you could reproduce the slowness in your test environment?  Slowness in the planner is probably related to the schema structure, not the data itself.

I don't think this would be related to the idle-in-transaction, except that one FDW connection maybe idle-in-transaction after its EXPLAIN is done because it is waiting for another FDW connection to slowly run its EXPLAIN.

Cheers,

Jeff

Re: [External] Re: FDW, too long to run explain

From
Vijaykumar Jain
Date:
Ok.
I’ll try to work on it this week and see if i am able to reproduce anything.

On Mon, 18 Feb 2019 at 2:30 AM Jeff Janes <jeff.janes@gmail.com> wrote:


On Sun, Feb 17, 2019 at 2:37 PM Vijaykumar Jain <vjain@opentable.com> wrote:

Ok, i raked this from the logs where enabled log_min_duration_statement = 10s

2019-01-31 12:48:18 UTC LOG:  duration: 29863.311 ms  statement: EXPLAIN SELECT blah, FROM public.view WHERE ((scheduled_bdt >= '2019-01-20'::date)) AND ((scheduled_bdt <= '2019-01-26'::date)) AND ((somekey = ANY ('{269029,123399,263164,261487}'::bigint[])))   (both the columns are indexed)

That is interesting.  Was that in the logs for the local or the foreign side?  And is it common, or rare?

If on the local side, could it be that the EXPLAINs sent to the foreign side are being made to wait by the connection pooler, leading to long delays?  If that is from the foreign side, then it should be conceptually unrelated to FDW.  Any chance you could reproduce the slowness in your test environment?  Slowness in the planner is probably related to the schema structure, not the data itself.

I don't think this would be related to the idle-in-transaction, except that one FDW connection maybe idle-in-transaction after its EXPLAIN is done because it is waiting for another FDW connection to slowly run its EXPLAIN.

Cheers,

Jeff
--

Regards,
Vijay

Re: [External] Re: FDW, too long to run explain

From
Jeff Janes
Date:
On Sun, Feb 17, 2019 at 6:32 AM Vijaykumar Jain <vjain@opentable.com> wrote:
I am yet to figure out the reason, what we have done is implement fake columns to represent samples and giving them random numbers and keeping other bulls to fake limit.

Most of the queries that were impacted were the ones that did not push order by and limit to foreign servers.
I am also trying to upgrade pg11 to make use of parallelisation.

postgres_fdw operates through declared cursors, and declared cursors inhibit parallel query.  This doesn't change in v11, see https://www.postgresql.org/docs/11/when-can-parallel-query-be-used.html

I'm not aware of any other changes in v11 that are likely to help you out.

Cheers,

Jeff

Re: [External] Re: FDW, too long to run explain

From
Vijaykumar Jain
Date:


Oh Wow, i guess you are right.
I just ran example where local runs make use of parallel setup, but not FDW.

i have three servers
2 x pg10
1 x pg11

i run queries on coordinator node ( pg11 ) which makes calls to foreign server to do a simple count.
the individual nodes run the query in parallel, the setup  is repeatable. but via FDW it runs a simple seq scan.
i guess this is for the same reason as you mentioned wrt declared cursors.


on pg11
create schema pg10;
create schema pg10_qa;
import foreign schema pg10  from server pg10 into pg10;
import foreign schema pg10_qa from server pg10_qa into pg10_qa;

explain (analyze,verbose) SELECT COUNT(1) FROM pg10.tbl_ItemTransactions;  ----this query is via FDW
                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Foreign Scan  (cost=108.53..152.69 rows=1 width=8) (actual time=6584.498..6584.500 rows=1 loops=1)
   Output: (count(1))
   Relations: Aggregate on (pg10.tbl_itemtransactions)
   Remote SQL: SELECT count(1) FROM pg10.tbl_itemtransactions
 Planning Time: 0.112 ms
 Execution Time: 6585.435 ms
(6 rows)

2019-02-18 09:56:48 UTC LOG:  duration: 6593.046 ms  plan:
Query Text: DECLARE c1 CURSOR FOR
SELECT count(1) FROM pg10.tbl_itemtransactions
Aggregate  (cost=768694.80..768694.81 rows=1 width=8) (actual time=6593.039..6593.039 rows=1 loops=1)
  Output: count(1)
  Buffers: shared hit=259476
  ->  Seq Scan on pg10.tbl_itemtransactions  (cost=0.00..666851.04 rows=40737504 width=0) (actual time=0.024..3389.245 rows=40737601 loops=1)
        Output: tranid, transactiondate, transactionname
        Buffers: shared hit=259476

--------

on pg10 (1) -- foreign server pg10
create schema pg10;
CREATE TABLE pg10.tbl_ItemTransactions
 (
     TranID SERIAL
     ,TransactionDate TIMESTAMPTZ
     ,TransactionName TEXT
 );
INSERT INTO pg10.tbl_ItemTransactions
(TransactionDate, TransactionName)
SELECT x, 'dbrnd'
FROM generate_series('2014-01-01 00:00:00'::timestamptz, '2016-08-01 00:00:00'::timestamptz,'2 seconds'::interval) a(x);

explain analyze SELECT count(1) FROM pg10.tbl_itemtransactions;  --this query is local 
                                                                          QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=472650.72..472650.73 rows=1 width=8) (actual time=2576.053..2576.054 rows=1 loops=1)
   ->  Gather  (cost=472650.50..472650.71 rows=2 width=8) (actual time=2575.721..2626.980 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=471650.50..471650.51 rows=1 width=8) (actual time=2569.302..2569.302 rows=1 loops=3)
               ->  Parallel Seq Scan on tbl_itemtransactions  (cost=0.00..429215.60 rows=16973960 width=0) (actual time=0.048..1492.144 rows=13579200 loops=3)
 Planning time: 0.405 ms
 Execution time: 2627.455 ms
(8 rows)
--------


on pg10 (2) -- foreign server pg10_qa
create schema pg10_qa;
CREATE TABLE pg10_qa.tbl_ItemTransactions
 (
     TranID SERIAL
     ,TransactionDate TIMESTAMPTZ
     ,TransactionName TEXT
 );
INSERT INTO pg10_qa.tbl_ItemTransactions
(TransactionDate, TransactionName)
SELECT x, 'dbrnd'
FROM generate_series('2014-01-01 00:00:00'::timestamptz, '2016-08-01 00:00:00'::timestamptz,'2 seconds'::interval) a(x);

explain analyze SELECT count(1) FROM pg10_qa.tbl_itemtransactions;  -- this query is local 
                                                                          QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=472650.72..472650.73 rows=1 width=8) (actual time=2568.469..2568.469 rows=1 loops=1)
   ->  Gather  (cost=472650.50..472650.71 rows=2 width=8) (actual time=2568.067..2613.006 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=471650.50..471650.51 rows=1 width=8) (actual time=2563.893..2563.893 rows=1 loops=3)
               ->  Parallel Seq Scan on tbl_itemtransactions  (cost=0.00..429215.60 rows=16973960 width=0) (actual time=0.017..1388.417 rows=13579200 loops=3)
 Planning time: 0.048 ms
 Execution time: 2613.246 ms
(8 rows)


but i guess partition elimination still works across the shards (see attached). atleast, we'll benefit from here :) in pg11.


Regards,
Vijay


On Mon, Feb 18, 2019 at 3:07 AM Jeff Janes <jeff.janes@gmail.com> wrote:
On Sun, Feb 17, 2019 at 6:32 AM Vijaykumar Jain <vjain@opentable.com> wrote:
I am yet to figure out the reason, what we have done is implement fake columns to represent samples and giving them random numbers and keeping other bulls to fake limit.

Most of the queries that were impacted were the ones that did not push order by and limit to foreign servers.
I am also trying to upgrade pg11 to make use of parallelisation.

postgres_fdw operates through declared cursors, and declared cursors inhibit parallel query.  This doesn't change in v11, see https://www.postgresql.org/docs/11/when-can-parallel-query-be-used.html

I'm not aware of any other changes in v11 that are likely to help you out.

Cheers,

Jeff
Attachment