Re: Bypassing cursors in postgres_fdw to enable parallel plans - Mailing list pgsql-hackers

From KENAN YILMAZ
Subject Re: Bypassing cursors in postgres_fdw to enable parallel plans
Date
Msg-id CAEV8nuk7Abj5Kh6X1SgeJo39fPNrt1O1LTXkw0nFAUpd5DoKFQ@mail.gmail.com
Whole thread Raw
In response to Re: Bypassing cursors in postgres_fdw to enable parallel plans  (KENAN YILMAZ <kenan.yilmaz@localus.com.tr>)
List pgsql-hackers
Hi Rafia,
Sorry for the late response. I have completed my tests, and parallel workers were successfully launched on the remote server. Below are the details of my setup and test results.

# Machine Details
CPU: 4 cores
Memory: 8GB
PostgreSQL Version: 17.2 (compiled from source)
OS: Rocky Linux 8.10
Two VM instances

# PostgreSQL Configuration (For Demonstration)

logging_collector = on  
log_truncate_on_rotation = on  
log_rotation_size = 1GB  
log_filename = 'postgresql-%a.log'  
log_line_prefix = '%t [%p]: %q bg=%b, db=%d, usr=%u, client=%h, qryId=%Q, txId=%x, app=%a, line=%l'  
max_worker_processes = 4  
max_parallel_workers_per_gather = 2  
max_parallel_maintenance_workers = 2  
max_parallel_workers = 4  
debug_print_plan = on  
track_functions = 'all'  
log_statement = 'all'  
postgres_fdw.use_cursor = false  
shared_preload_libraries = 'pg_stat_statements,auto_explain'  
compute_query_id = on  
min_parallel_table_scan_size = 0  
parallel_tuple_cost = 0  
parallel_setup_cost = 0  
auto_explain.log_min_duration = '0.00ms'  
auto_explain.log_analyze = 'on'  

# Memory Settings
effective_cache_size = '4GB'  
shared_buffers = '1638MB'  
work_mem = '100MB'  

# Test Setup
# Creating pgbench Tables

$ pgbench -i -s 50 testdb  

Running SQL Tests on Local Machine (192.168.1.68)

psql> CREATE EXTENSION postgres_fdw;  
psql> CREATE SERVER fdwtest FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '192.168.1.69', dbname 'testdb');  
psql> CREATE USER MAPPING FOR postgres SERVER fdwtest OPTIONS (user 'postgres');  
psql> CREATE SCHEMA fdwtest;  
psql> IMPORT FOREIGN SCHEMA public FROM SERVER fdwtest INTO fdwtest;  

# Query 1: Counting Rows in a Foreign Table
testdb=# explain analyze select count(*) from fdwtest.pgbench_accounts where aid> 1000;
                                            QUERY PLAN
──────────────────────────────────────────────────────────────────────────────────────────────────
 Foreign Scan  (cost=102.84..155.73 rows=1 width=8) (actual time=457.965..457.967 rows=1 loops=1)
   Relations: Aggregate on (pgbench_accounts)
 Planning Time: 0.661 ms
 Execution Time: 458.802 ms
(4 rows)

Time: 461.944 ms

# Query 2: Fetching a Single Row from Foreign Table
testdb=# explain analyze select aid from fdwtest.pgbench_accounts where aid> 1000 limit 1;
                                                    QUERY PLAN
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Foreign Scan on pgbench_accounts  (cost=100.00..100.24 rows=1 width=4) (actual time=9.335..9.336 rows=1 loops=1)
 Planning Time: 0.452 ms
 Execution Time: 10.304 ms
(3 rows)

Time: 12.605 ms

## PostgreSQL Logs
## Local Machine Logs (192.168.1.68) – Cropped

bg=client backend, db=testdb, usr=postgres, client=[local] , qryId=-6000156370405137929 , txId=0, app=psql, line=14 LOG:  duration: 457.971 ms  plan:
        Query Text: explain analyze select count(*) from fdwtest.pgbench_accounts where aid> 1000;
        Foreign Scan  (cost=102.84..155.73 rows=1 width=8) (actual time=457.965..457.967 rows=1 loops=1)
          Relations: Aggregate on (pgbench_accounts)
..
STATEMENT:  explain analyze select aid from fdwtest.pgbench_accounts where aid> 1000 limit 1;
bg=client backend, db=testdb, usr=postgres, client=[local] , qryId=5870070636604972000 , txId=0, app=psql, line=19 LOG:  duration: 9.339 ms  plan:
        Query Text: explain analyze select aid from fdwtest.pgbench_accounts where aid> 1000 limit 1;
        Foreign Scan on pgbench_accounts  (cost=100.00..100.24 rows=1 width=4) (actual time=9.335..9.336 rows=1 loops=1)

## Remote Machine Logs (192.168.1.69) – Cropped
STATEMENT:  SELECT count(*) FROM public.pgbench_accounts WHERE ((aid > 1000))
bg=client backend, db=testdb, usr=postgres, client=192.168.1.68 , qryId=-7176633966431489392 , txId=0, app=postgres_fdw, line=22 LOG:  execute <unnamed>: SELECT count(*) FROM public.pgbench_accounts WHERE ((aid > 1000))
bg=client backend, db=testdb, usr=postgres, client=192.168.1.68 , qryId=0 , txId=0, app=postgres_fdw, line=23 LOG:  statement: COMMIT TRANSACTION
bg=client backend, db=testdb, usr=postgres, client=192.168.1.68 , qryId=-2835399305386018931 , txId=0, app=postgres_fdw, line=24 LOG:  duration: 455.461 ms  plan:
        Query Text: SELECT count(*) FROM public.pgbench_accounts WHERE ((aid > 1000))
        Finalize Aggregate  (cost=113216.98..113216.99 rows=1 width=8) (actual time=454.321..455.452 rows=1 loops=1)
          ->  Gather  (cost=113216.97..113216.98 rows=2 width=8) (actual time=454.173..455.443 rows=3 loops=1)
                Workers Planned: 2
                Workers Launched: 2
                ->  Partial Aggregate  (cost=113216.97..113216.98 rows=1 width=8) (actual time=449.393..449.394 rows=1 loops=3)
                      ->  Parallel Seq Scan on pgbench_accounts  (cost=0.00..108009.67 rows=2082920 width=0) (actual time=0.255..343.378 rows=1666333 loops=3)
                            Filter: (aid > 1000)
                            Rows Removed by Filter: 333
bg=client backend, db=testdb, usr=postgres, client=192.168.1.68 , qryId=0 , txId=0, app=postgres_fdw, line=25 LOG:  statement: START TRANSACTION ISOLATION LEVEL REPEATABLE READ
..
STATEMENT:  SELECT aid FROM public.pgbench_accounts WHERE ((aid > 1000)) LIMIT 1::bigint
bg=client backend, db=testdb, usr=postgres, client=192.168.1.68 , qryId=5994602644362067232 , txId=0, app=postgres_fdw, line=29 LOG:  execute <unnamed>: SELECT aid FROM public.pgbench_accounts WHERE ((aid > 1000)) LIMIT 1::bigint
bg=client backend, db=testdb, usr=postgres, client=192.168.1.68 , qryId=0 , txId=0, app=postgres_fdw, line=30 LOG:  statement: COMMIT TRANSACTION
bg=client backend, db=testdb, usr=postgres, client=192.168.1.68 , qryId=-2835399305386018931 , txId=0, app=postgres_fdw, line=31 LOG:  duration: 7.983 ms  plan:
        Query Text: SELECT aid FROM public.pgbench_accounts WHERE ((aid > 1000)) LIMIT 1::bigint
        Limit  (cost=0.00..0.02 rows=1 width=4) (actual time=0.836..7.974 rows=1 loops=1)
          ->  Gather  (cost=0.00..108009.67 rows=4999007 width=4) (actual time=0.834..7.972 rows=1 loops=1)
                Workers Planned: 2
                Workers Launched: 2
                ->  Parallel Seq Scan on pgbench_accounts  (cost=0.00..108009.67 rows=2082920 width=4) (actual time=0.270..0.271 rows=1 loops=3)
                      Filter: (aid > 1000)
                      Rows Removed by Filter: 333



if you would like me to conduct more complex tests, feel free to let me know.
Best regards,
Kenan YILMAZ



KENAN YILMAZ <kenan.yilmaz@localus.com.tr>, 17 Şub 2025 Pzt, 17:09 tarihinde şunu yazdı:
Hi Rafia,

Based on our previous private discussion, thanks for the update and for clarifying the current state of the patch. 
I understand that more substantial changes are on the way, so I’ll focus on relevant test scenarios rather than performance testing at this stage.

I will proceed with expanding the scenarios, including:

Multiple postgres_fdw servers are active at the same time
Multiple connections from the same postgres_fdw are active concurrently
Multiple transactions run simultaneously on a single connection
Multiple sessions operate from a single active connection

I will submit the results of these tests to this mail thread so that they can benefit the broader community as well. Additionally, once you publish the updated version of the patch, I will rerun the tests with the latest changes and share the updated results.

Best Regards,

Rafia Sabih <rafia.pghackers@gmail.com>, 17 Şub 2025 Pzt, 16:46 tarihinde şunu yazdı:
On Tue, 14 Jan 2025 at 18:33, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Jan 6, 2025 at 3:52 AM Rafia Sabih <rafia.pghackers@gmail.com> wrote:
> Now, to overcome this limitation, I have worked on this idea (suggested by my colleague Bernd Helmle) of bypassing the cursors. The way it works is as follows,
> there is a new GUC introduced postgres_fdw.use_cursor, which when unset uses the mode without the cursor. Now, it uses PQsetChunkedRowsMode in create_cursor when non-cursor mode is used. The size of the chunk is the same as the fetch_size. Now in fetch_more_data, when non-cursor mode is used, pgfdw_get_next_result is used to get the chunk in PGresult and processed in the same manner as before.
>
> Now, the issue comes when there are simultaneous queries, which is the case with the join queries where all the tables involved in the join are at the local server. Because in that case we have multiple cursors opened at the same time and without a cursor mechanism we do not have any information or any other structure to know what to fetch from which query. To handle that case, we have a flag only_query, which is unset as soon as we have assigned the cursor_number >= 2, in postgresBeginForeignScan. Now, in fetch_more data, when we find out that only_query is unset, then we fetch all the data for the query and store it in a Tuplestore. These tuples are then transferred to the fsstate->tuples and then processed as usual.
>
> So yes there is a performance drawback in the case of simultaneous queries, however, the ability to use parallel plans is really an added advantage for the users. Plus, we can keep things as before by this new GUC -- use_cursor, in case we are losing more for some workloads.  So, in short I feel hopeful that this could be a good idea and a good time to improve postgres_fdw.

Hi,

I think it might have been nice to credit me in this post, since I
made some relevant suggestions here off-list, in particular the idea
of using a Tuplestore when there are multiple queries running. But I
don't think this patch quite implements what I suggested. Here, you
have a flag only_query which gets set to true at some point in time
and thereafter remains true for the lifetime of a session. That means,
I think, that all future queries will use the tuplestore even though
there might not be multiple queries running any more. which doesn't
seem like what we want. And, actually, this looks like it will be set
as soon as you reach the second query in the same transaction, even if
the two queries don't overlap. I think what you want to do is test
whether, at the point where we would need to issue a new query,
whether an existing query is already running. If not, move that
query's remaining results into a Tuplestore so you can issue the new
query.

I'm not sure what the best way to implement that is, exactly. Perhaps
fsstate->conn_state needs to store some more details about the
connection, but that's just a guess. I don't think a global variable
is what you want. Not only is that session-lifetime, but it applies
globally to every connection to every server. You want to test
something that is specific to one connection to one server, so it
needs to be part of a data structure that is scoped that way.

I think you'll want to figure out a good way to test this patch. I
don't know if we need or can reasonably have automated test cases for
this new functionality, but you at least want to have a good way to do
manual testing, so that you can show that the tuplestore is used in
cases where it's necessary and not otherwise. I'm not yet sure whether
this patch needs automated test cases or whether they can reasonably
be written, but you at least want to have a good procedure for manual
validation so that you can verify that the Tuplestore is used in all
the cases where it needs to be and, hopefully, no others.

--
Robert Haas
EDB: http://www.enterprisedb.com

Indeed you are right.
Firstly, accept my apologies for not mentioning you in credits for this work. Thanks a lot for your efforts, discussions with you were helpful in shaping this patch and bringing it to this level.

Next, yes the last version was using tuplestore for queries within the same transaction after the second query. To overcome this, I came across this method to identify if there is any other simultaneous query running with the current query; now there is an int variable num_queries which is incremented at every call of postgresBeginForeignScan and decremented at every call of postgresEndForeignScan. This way, if there are simultaneous queries running we get the value of num_queries greater than 1. Now, we check the value of num_queries and use tuplestore only when num_queries is greater than 1. So, basically the understanding here is that if postgresBeginForeignScan is called and before the call of postgresEndForeignScan if another call to postgresBeginForeignScan is made, then these are simultaneous queries.

I couldn't really find any automated method of testing this, but did it manually by debugging and/or printing log statements in postgresBeginForeingScan, postgresEndForeignScan, and fetch_more_data to confirm indeed there are simultaneous queries, and only they are using tuplestore. So, the case of simultaneous queries I found was the join query. Because, there it creates the cursor for one side of the join and retrieves the first tuples for it and then creates the next cursor for the other side of join and keeps on reading all the tuples for that query and then it comes back to first cursor and retrieves all the tuples for that one. Similarly, it works for the queries with n number of tables in join, basically what I found is if there are n tables in the join there will be n open cursors at a time and then they will be closed one by one in the descending order of the cursor_number. I will think more on the topic of testing this and will try to come up with a script (in the best case) to confirm the use of tuplestore in required cases only, or atleast with a set of steps to do so.

For the regular testing of this feature, I think a regression test with this new GUC postgres_fdw.use_cursor set to false and running all the existing tests of postgres_fdw should suffice. What do you think? However, at the moment when non-cursor mode is used, regression tests are failing. Some queries require order by because order is changed in non-cursor mode, but some require more complex changes, I am working on them.

In this version of the patch I have added only the changes mentioned above and not the regression test modification.

--
Regards,
Rafia Sabih
CYBERTEC PostgreSQL International GmbH

pgsql-hackers by date:

Previous
From: Jelte Fennema-Nio
Date:
Subject: Re: Next commitfest app release is planned for March 18th
Next
From: Peter Eisentraut
Date:
Subject: Re: Next commitfest app release is planned for March 18th