Thread: Foreign table performance issue / PostgreSQK vs. ORACLE

Foreign table performance issue / PostgreSQK vs. ORACLE

From

"Markhof, Ingolf"

Date:

29 January 2021, 12:56:18

Hi!

I am struggling with the slow performance when running queries referring to foreign tables. – Yes, I know… - Please read the whole story!

The set-up basically is a production database and a reporting database. As names indicate, the production database is used for production, the reporting database is for analysis. On the reporting database, the only way to access product data is via foreign tables that link to the related production tables.

Now, while some queries on the reporting service run fine, some don't even return any data after hours.

However, the same set-up worked fine in Oracle before. Reporting wasn't always fast, but it delivered results in acceptable time. A query executed on the Oracle reporting server returns data in e.g. 30 seconds. But running the query translated to PostgreSQL on the PostgreSQL DB does not deliver a single row after hours (!) of run time.

So, I wonder: Is there a fundamental difference between Oracle database links and foreign tables in PostgreSQL that could explain the different run times? Could there be some tuning option in PostgreSQL to make queries via foreign tables faster (e.g. I heard about option fetch_size)?

Your pointes welcome!

Verizon Deutschland GmbH - Sebrathweg 20, 44149 Dortmund, Germany - Amtsgericht Dortmund, HRB 14952 - Geschäftsführer: Detlef Eppig - Vorsitzender des Aufsichtsrats: Francesco de Maio

Re: Foreign table performance issue / PostgreSQK vs. ORACLE

From

Thomas Kellerer

Date:

29 January 2021, 14:40:21

Markhof, Ingolf schrieb am 29.01.2021 um 13:56:
> The set-up basically is a production database and a reporting
> database. As names indicate, the production database is used for
> production, the reporting database is for analysis. On the reporting
> database, the only way to access product data is via foreign tables
> that link to the related production tables.>
>  
> However, the same set-up worked fine in Oracle before. Reporting
> wasn't always fast, but it delivered results in acceptable time. A
> query executed on the Oracle reporting server returns data in e.g. 30
> seconds. But running the query translated to PostgreSQL on the
> PostgreSQL DB does not deliver a single row after hours (!) of run
> time.
>
> So, I wonder: Is there a fundamental difference between Oracle
> database links and foreign tables in PostgreSQL that could explain
> the different run times?

My guess is, that your queries use predicates that can't be pushed down
to the foreign server on Postgres, but Oracle can.

What is your Postgres version?

If my assumption is correct, then maybe if you showed one example query,
it might be possible to figure out a way to restructure it.

Is logical replication an option?

How accurate does the data on the reporting server need to be?
Would using materialized views that "cache" the foreign table be an
option? That's obviously only suitable if you can live with some
stale data and are fine with refreshing them maybe twice a day
(depending on how fast the refresh is)

Re: Foreign table performance issue / PostgreSQK vs. ORACLE

From

Tom Lane

Date:

29 January 2021, 15:16:05

Thomas Kellerer <shammat@gmx.net> writes:
> Markhof, Ingolf schrieb am 29.01.2021 um 13:56:
>> So, I wonder: Is there a fundamental difference between Oracle
>> database links and foreign tables in PostgreSQL that could explain
>> the different run times?

> My guess is, that your queries use predicates that can't be pushed down
> to the foreign server on Postgres, but Oracle can.
> If my assumption is correct, then maybe if you showed one example query,
> it might be possible to figure out a way to restructure it.

More generally:

https://wiki.postgresql.org/wiki/Slow_Query_Questions

These sorts of questions seldom have generic fixes, but if you give
us enough detail, we might be able to help.

            regards, tom lane

Re: Foreign table performance issue / PostgreSQK vs. ORACLE

From

Sebastian Dressler

Date:

30 January 2021, 10:45:08

Hi Ingolf,

> On 29. Jan 2021, at 13:56, Markhof, Ingolf <ingolf.markhof@de.verizon.com> wrote:
> 
> Hi!
>  
> I am struggling with the slow performance when running queries referring to foreign tables. – Yes, I know… - Please
readthe whole story!

Done and it rings a bell or two.

> The set-up basically is a production database and a reporting database. As names indicate, the production database is
usedfor production, the reporting database is for analysis. On the reporting database, the only way to access product
datais via foreign tables that link to the related production tables.

>  
> Now, while some queries on the reporting service run fine, some don't even return any data after hours.
>  
> However, the same set-up worked fine in Oracle before. Reporting wasn't always fast, but it delivered results in
acceptabletime. A query executed on the Oracle reporting server returns data in e.g. 30 seconds. But running the query
translatedto PostgreSQL on the PostgreSQL DB does not deliver a single row after hours (!) of run time.

>  
> So, I wonder: Is there a fundamental difference between Oracle database links and foreign tables in PostgreSQL that
couldexplain the different run times? Could there be some tuning option in PostgreSQL to make queries via foreign
tablesfaster (e.g. I heard about option fetch_size)?

You did not explicitly mention it, but I assume you are using postgres_fdw to connect from reporting (R) to production
(P).Thomas and Tom already mentioned incomplete/non-existing/non-applicable filter pushdowns. I want to add another
probableroot cause to the list explaining the behavior you experience.

The postgres_fdw uses a CURSOR on P to execute the query. While this guarantees transaction safety, it also prohibits
parallelism(PostgreSQL server-side cursors enforce a sequential plan).

As a result, depending on the size of tables, indexes, and filters pushed down (or not), this probably results in
slow-runningqueries. IMO, the worst-case scenario is that a sequential table scan without any filtering, and a single
workerruns on the target.

Of course, you can try to optimize schemas on P and queries on R, enabling more filter pushdown and eventually a faster
execution.However, I believe this does not work with your entire workload, i.e. there will always be performance gaps.

The parallelism issue is theoretically fixable by utilizing partitions on P. R then connects to P with multiple
postgres_fdw-backedchild tables. However, this will only work with a patch to postgres_fdw to implement
"IsForeignScanParallelSafe"(see [1] for a possible implementation). Without this method, there will be no parallelism
again.Without, the partitions scan occurs sequentially, not showing a performance gain.

I want to mention there are proprietary options available (re-)enabling PostgreSQL parallelism with cursors. Such an
extensioncan potentially fix your performance issue. However, I have not tried it so far with a setup similar to
yours.

Cheers,
Sebastian

[1]: https://github.com/swarm64/parallel-postgres-fdw-patch

--

Sebastian Dressler, Solution Architect, Swarm64 AS
+49 30 994 0496 72 | sebastian@swarm64.com

Re: Foreign table performance issue / PostgreSQK vs. ORACLE

From

Jeremy Smith

Date:

30 January 2021, 11:28:06

Could there be some tuning option in PostgreSQL to make queries via foreign tables faster (e.g. I heard about option fetch_size)?

fetch_size can make a difference, but it won't change a query that takes hours into a query that takes seconds. The default is likely too low, though.

Have you analyzed the foreign table or set use_remote_estimate? With no statistics, postgres may just be pulling the entire remote table.

As others have mentioned, it would be useful to see your query. There may be obvious issues, such as functions in the predicate, but without the query, we can only guess.

RE: Foreign table performance issue / PostgreSQK vs. ORACLE

From

"Markhof, Ingolf"

Date:

02 February 2021, 12:05:53

Hi!

My PostgreSQL version is 11.8. 

The query I am running is referring to a number of foreign tables. The first one (table1) has to IDs, let's say ID_A
andID_B. While ID_A is unique, ID_B is not. In my case, I am pulling formation for a value of IB_B for which about 800
rows(with unique ID_A) exist. I found:

While

    select * from my_view where id_b='some value';

seemingly runs "forever" (I cancelled execution after a few hours), the following completes in about 1 hr:

    select * from my_view where ia_a in (
        select id_a from table1 where id_b='some value'
     );

So, I tried smaller chunks of ID_a and found the execution time is non-linear with respect to number of IDs. For e.g.
50ID_A's, it was completed in about 12 sec.

Next I have split the ~800 ID_A's into chunks of 50 and submitted these 16 queries one after another. They all
completedin about 12 secs, each.

I then found the option fetch_size, e.g. ALTER SERVER some_server OPTIONS (fetch_size '50000'). A chunk of 50 now
executesin 2 seconds (instead of 12 before).

So, I found the "size" of the query has a serious impact to the execution time. I don't really understand why execution
16*50takes 16*2 secs only, but executing 1*800 takes about 3000 seconds...

Regards,
Ingolf

-----Original Message-----
From: Sebastian Dressler [mailto:sebastian@swarm64.com] 
Sent: 30 January 2021 11:45
To: Markhof, Ingolf <ingolf.markhof@de.verizon.com>
Cc: pgsql-general@lists.postgresql.org
Subject: [E] Re: Foreign table performance issue / PostgreSQK vs. ORACLE

Hi Ingolf,

> On 29. Jan 2021, at 13:56, Markhof, Ingolf <ingolf.markhof@de.verizon.com> wrote:
> 
> Hi!
>  
> I am struggling with the slow performance when running queries referring to foreign tables. – Yes, I know… - Please
readthe whole story!

Done and it rings a bell or two.

> The set-up basically is a production database and a reporting database. As names indicate, the production database is
usedfor production, the reporting database is for analysis. On the reporting database, the only way to access product
datais via foreign tables that link to the related production tables.

>  
> Now, while some queries on the reporting service run fine, some don't even return any data after hours.
>  
> However, the same set-up worked fine in Oracle before. Reporting wasn't always fast, but it delivered results in
acceptabletime. A query executed on the Oracle reporting server returns data in e.g. 30 seconds. But running the query
translatedto PostgreSQL on the PostgreSQL DB does not deliver a single row after hours (!) of run time.

>  
> So, I wonder: Is there a fundamental difference between Oracle database links and foreign tables in PostgreSQL that
couldexplain the different run times? Could there be some tuning option in PostgreSQL to make queries via foreign
tablesfaster (e.g. I heard about option fetch_size)?

You did not explicitly mention it, but I assume you are using postgres_fdw to connect from reporting (R) to production
(P).Thomas and Tom already mentioned incomplete/non-existing/non-applicable filter pushdowns. I want to add another
probableroot cause to the list explaining the behavior you experience.

The postgres_fdw uses a CURSOR on P to execute the query. While this guarantees transaction safety, it also prohibits
parallelism(PostgreSQL server-side cursors enforce a sequential plan).

As a result, depending on the size of tables, indexes, and filters pushed down (or not), this probably results in
slow-runningqueries. IMO, the worst-case scenario is that a sequential table scan without any filtering, and a single
workerruns on the target.

Of course, you can try to optimize schemas on P and queries on R, enabling more filter pushdown and eventually a faster
execution.However, I believe this does not work with your entire workload, i.e. there will always be performance gaps.

The parallelism issue is theoretically fixable by utilizing partitions on P. R then connects to P with multiple
postgres_fdw-backedchild tables. However, this will only work with a patch to postgres_fdw to implement
"IsForeignScanParallelSafe"(see [1] for a possible implementation). Without this method, there will be no parallelism
again.Without, the partitions scan occurs sequentially, not showing a performance gain.

I want to mention there are proprietary options available (re-)enabling PostgreSQL parallelism with cursors. Such an
extensioncan potentially fix your performance issue. However, I have not tried it so far with a setup similar to
yours.

Cheers,
Sebastian

[1]:
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_swarm64_parallel-2Dpostgres-2Dfdw-2Dpatch&d=DwIGaQ&c=udBTRvFvXC5Dhqg7UHpJlPps3mZ3LRxpb6__0PomBTQ&r=ivZWA-ECVj3XrXBe0obDwKY7Ui7K5Nj9oD2KKWLm0Bw&m=urVtRLfrc1kNan7AL2Al4g0Dq-bCi5UPxtnOEzHlj_U&s=ZkvPe7hWFG3H6Q2q9bca7l984-UxMeNw1fFOAyLWlPg&e=

--

Sebastian Dressler, Solution Architect, Swarm64 AS
+49 30 994 0496 72 | sebastian@swarm64.com

Verizon Deutschland GmbH - Sebrathweg 20, 44149 Dortmund, Germany - Amtsgericht Dortmund, HRB 14952 - Geschäftsführer:
DetlefEppig - Vorsitzender des Aufsichtsrats: Francesco de Maio

Re: Foreign table performance issue / PostgreSQK vs. ORACLE

From

Sebastian Dressler

Date:

03 February 2021, 09:12:41

Hi Ingolf,

On 2. Feb 2021, at 13:05, Markhof, Ingolf <ingolf.markhof@de.verizon.com> wrote:

Hi!

My PostgreSQL version is 11.8.

The query I am running is referring to a number of foreign tables. The first one (table1) has to IDs, let's say ID_A and ID_B. While ID_A is unique, ID_B is not. In my case, I am pulling formation for a value of IB_B for which about 800 rows (with unique ID_A) exist. I found:

While

select * from my_view where id_b='some value';

seemingly runs "forever" (I cancelled execution after a few hours), the following completes in about 1 hr:

select * from my_view where ia_a in (
select id_a from table1 where id_b='some value'
);

So, I tried smaller chunks of ID_a and found the execution time is non-linear with respect to number of IDs. For e.g. 50 ID_A's, it was completed in about 12 sec.

[...]

I then found the option fetch_size, e.g. ALTER SERVER some_server OPTIONS (fetch_size '50000'). A chunk of 50 now executes in 2 seconds (instead of 12 before).

Thanks for the additional info. I tried to replicate this, you can find the GitHub Gist at [1], happy to hear your feedback about it.

What I can see from the execution plans in my example is, that the postgres_fdw pushes down that part

Remote SQL: SELECT id_a FROM public.a WHERE ((id_b = 1))

part to the remote. On the remote DB this query can result in either an index-only scan, an index scan or a full table scan. Which method is chosen depends on table size and indexes. Given the nature of postgres_fdw this will be done in any case with a sequential query. For this part I would claim:

- If the planner expects few rows, it will choose an index-lookup which is a good thing because it effectively reduces the amount of data that needs to be queried. This would make it fast, given this is a sequential scan.

- If the planner expects many rows, it might choose a sequential scan which can be slow depending on the overall size of the table and likely whether it is cached or not.

So, I found the "size" of the query has a serious impact to the execution time. I don't really understand why execution 16*50 takes 16*2 secs only, but executing 1*800 takes about 3000 seconds...

The mentioned fetch_size parameter has a positive effect, because one can grab many more rows and return them at the same time. Worst case (and this is just pure assumption), on each new fetch, the query might be re-executed and thus runtime becomes much more.

Further up in the plan, I see

Remote SQL: SELECT id_a, id_b FROM public.a

which is the "SELECT * FROM my_view" part. Meaning, here it will definitely do a full table scan on remote since it cannot push down the IN condition. I don't really see right now why this query at all is slower than your original form. In my experiment it is not, but maybe I am doing something wrong in the schema.

One thought would be however, that the full table scan on the remote is more efficient than pushing down the filter and thus it returns faster.

To really figure out more, I would suggest to increase the logging level on your remote server in order to see which queries are really executed. Even better to maybe use auto_explain to fetch plans and see whether these claims apply.

Best,

Sebastian

--

[1]: https://gist.github.com/sdressler/9a93d66b7052dc75ec45c0a4bf5c61de

Sebastian Dressler, Solution Architect, Swarm64
+49 30 994 0496 72 | sebastian@swarm64.com